Visit Our UNIX and Linux User Community


String matching question


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting String matching question
# 1  
Old 10-20-2007
Computer String matching question

Folks;
I need help with this:
I have a text file has a lot of lines, each line is a string consists of tree of directries, i would like to ignore any lines starting with "#" then grep an exact match of a string, then if i find a matching string with a child directory print it out. Below is the details:

The text file looks something like:

/new/tree/xxx/yyy/zzz
#/new/free/opt/yyy
/aaa/bbb/ccc/
/aaa/bbb/ccc/ddd/eee
/aaa/bbb/ccc/ddd


Now, i want first to ignore any line starts with "#"

Second, i want to do the following for EACH line starting with the first:
- look for exact string matching that line
then
1. if the matching string has any extra children, Ignore it. "directories under the string"
2. If there's no child directory under the string, print out the string then add this phrase to it "/hello/every/one" and redirect the output to a new text file.
This process should do that for each line in the original text file.

Thanks in advance
# 2  
Old 10-20-2007
Usually the recipe for writing good regular expressions is to phrase the problem correctly - most of the times this alone is providing the solution.

In your case you were *almost* there already, so this is simple:

First we filter out all the lines starting with "#". This is done by a special regexp device: "^" if used at the beginning of a regexp, means "start of line". That is: "^#" doesn't mean a caret-char followed by a octothorpe, but an octothorpe as the first char of a line. Here is the script with some sample text, everything filtered out is marked blue:

Code:
sed '/^#/d' file > file.changed

this line goes through
# this line is blocked
this line goes through even if it has a # in it

Now for the next problem: match a line with an exact content and print it (to a file). Your problem with the child directories could be stated as "match a line with a content and no additional content". We achieve this by using a similar device as above: the "$" at the end of a regexp means "end of line". That is: "x$" means not "x followed by a dollar sign", but "x as the last character of a line".

By the way, as we are just searching for specific lines and ignore all the others we could simply skip the filtering out of the lines starting with an octothorpe ("#"), as we won't find them anyways. we can simply turn off any output of sed (the -n option) and only explicitly print the found lines. I let the filter for the commentary lines in there, but it is redundant.

Here it is with a sample text, i marked blue what is printed out:

Code:
sed -n '/^#/d;/^this is my text to find$/p' file > file.changed

# this line is blocked by rule 1
this is my text to find but with additional text
this is my text to find

now for the last part, the adding of the additional parts: we simply change the rule 2 which finds and prints the text to a substitution. We use here the sed-capability to provide the matched part of the text in the output. The "&" in the substitution contains what we have really matched in the search expression:

Code:
sed -n '/^#/d;s/^this is my text to find$/& with added text/p' file > file.changed

# this line is blocked by rule 1
this is my text to find but with additional text
this is my text to find

The content of file.changed should be a single line "this is my text to find with added text".

We get back to your problem again: in your text there are slash-characters and as "/" is a part of the sed-syntax too you will have to "escape" it by putting a "\" in front of it: to match "/usr/bin" use the expression "\/usr\/bin".

Furthermore, it is most of the times a good idea to clear any unnecessary whitespace from a line prior to matching it. Most of the times we do NOT want to get trailing or leading blanks, tabs, etc. in the way and "match" and "<tab><blank>match" are quite the same. So I would write it that way ("<spc>" is a literal space, "<tab>" is a TAB character):

Code:
sed -n 's/^[<tab><spc>]*//
        s/[<tab><spc>]*$//
        /^#*/d
        s/^\/the\/directory\/to\/find$/&/hello\/every\/one/p' > file.changed

Here is a last tip: when you prepare regexps, test them against short texts Prepare the most difficult examples you can think of. Notice four kinds of lines and try to provide one in each category:

The ones that are matched and should be matched;
the ones that are matched but shouldn't be matched;
The ones that are not matched but should be matched;
the ones that are not matched and correctly so.

bakunin
# 3  
Old 10-21-2007
A possible solution with sort and awk :
Code:
sort katkota.dat | \
awk '

   function print_if_no_child(curr_path) {
      if (match(curr_path, "^" path "/") == 0)
         print path, "/hello/every/one";
      else print path, "directories under the string";
   }

   /^#/         { next }
   path_cnt > 0 { print_if_no_child($0) }
                { gsub(/\/*$/, ""); path = $0 ; path_cnt++ }
   END          { print_if_no_child("") }
'

Input File:
Code:
/new/tree/xxx/yyy/zzz
#/new/free/opt/yyy
/aaa/bbb/ccc/
/aaa/bbb/ccc/ddd/eee
/aaa/bbb/ccc/ddd
/aaa/xxx
/aaa/xxx/yyy/zzz
#end of datas

Output:
Code:
/aaa/bbb/ccc directories under the string
/aaa/bbb/ccc/ddd directories under the string
/aaa/bbb/ccc/ddd/eee /hello/every/one
/aaa/xxx directories under the string
/aaa/xxx/yyy/zzz /hello/every/one
/new/tree/xxx/yyy/zzz  /hello/every/one

Jean-Pierre.
# 4  
Old 10-21-2007
Folks;
I very much appreciate your help, but now there's some changes to the requirements (I apologize for the confusion), but i would appreciate if i can get some help with it:

Now i need look for each line (lines consist of a directory trees), then for each tree, i need to search throughout the file to find the shortest one "the tree with no children", then append a text phrase to it & redirect the output to a new text file:
in details:

Let's say the text file looks like:

/aa/bb/cc/dd/ee
/xxx/yyy/zzz
/aa/bb/cc
/xxx/yyy/zzz/fff/nnn
/aa/bb/cc/dd
/mm/uu/ss/tt/rr
/mm/uu/ss/tt


for this sample, i should search the first line, then find a similar tree but keep looking until i find the one with the shortest tree, which in this example is "/aa/bb/cc" which has only three directories, since the other two lines in the file have longer paths trees (one is /aa/bb/cc/dd/ee & the other is /aa/bb/cc/dd).
so after i extract the shortest "/aa/bb/cc" append a phrase or another folder like "plus" to look like "/aa/bb/cc/plus" then redirect this result to a new text file.
Now i go to the second line & do the same thing.

i hope i explained it well.

Once again, i appreciate the help.
# 5  
Old 10-22-2007
If I understand correctly and sorting is acceptable:
Code:
sort file|awk '!x[$2]++&&$0=$0"/plus"' FS="/">new_text_file

Use nawk or /usr/xpg4/bin/awk on Solaris.
# 6  
Old 10-22-2007
Try and adapt the following script :
Code:
sort katkota.dat | \
awk '
   /^#/ { next }
   { gsub(/\/*[[:space:]]*$/, ""); if (! root) root=$0}
   root { if (match($0 "/", "^" root "/")==0) {
        print root "/file"
        root = $0
     }
   }
   END { print "root "/file" }
'

Input:
Code:
/a
/b
/c
/usr
/new/tree
/new/tree/xxx/yyy/zzz
#/new/free/opt/yyy
/aaa/bbb/ccc/
/aaa/bbb/ccc/ddd/eee
/aaa/bbb/ccc/ddd
/aaa/xxx
/aaa/xxx/yyy/zzz
#end of datas

Output:
Code:
k2.sh
/a/file
/aaa/bbb/ccc/file
/aaa/xxx/file
/b/file
/c/file
/new/tree/file
/usr/file

Jean-Pierre.
# 7  
Old 10-22-2007
Thanks a lot.
But Aigles, could you please explain your code to me, i'm a little puzzled with it?

Previous Thread | Next Thread
Test Your Knowledge in Computers #622
Difficulty: Medium
In Python, you cannot have an else clause for the while loop.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Matching string and assembling

I have been thinking how to address this particular task but is way beyond my knowledge. I have a reference sequence, something like this: >Reference AGAGAGACCTGGAGAGAGAGTGACGATGAGCAGTGACGATGACGTACGATAGCAGTAGACGCA and a input.txt file with thousand of short sequences, something like this ... (4 Replies)
Discussion started by: Xterra
4 Replies

2. UNIX for Dummies Questions & Answers

Matching string

Hello all, i am trying to match a string and based on that proceed with my script or error out... i have a file called /tmp/sta.log that will be populated by oracle's spooling..it can have a output of either 2 of the below (OPEN or errors/ORACLE not avaiable) $ cat /tmp/sta.log OPEN $ $... (2 Replies)
Discussion started by: abdul.irfan2
2 Replies

3. Shell Programming and Scripting

Matching string from input to string of file

Hi, i want to know how to compare string of file with input string im trying following code: file_no=`paste -s -d "||||\n" a.txt | cut -c 1` #it will return collection number from file echo "enter number" read " curr_no" if ; then echo " current number already present" fi ... (4 Replies)
Discussion started by: a_smith
4 Replies

4. Shell Programming and Scripting

String matching

I have a string like ab or abc of whatever length. But i want to know whether another string ( for example, abcfghijkl, OR a<space> bcfghijkl ab<space> cfghijkl OR a<space>bcfghijkl OR ab<space> c<space> fghijkl ) starts with ab or abc... space might existing on the longer string... If so, i... (4 Replies)
Discussion started by: nram_krishna@ya
4 Replies

5. Shell Programming and Scripting

Help Required For String Matching

I am new to shell scripting !!!!!!!!!!.ANY HELP WOULD BE APPRECIATE :- i want to write a script that will check the log for string: waiting for seconds for this I am using :- tail -10 log.20101004 | tail -1 and grep the "string" but when matching error is coming ,see script below:- i... (1 Reply)
Discussion started by: abhigrkist
1 Replies

6. Shell Programming and Scripting

matching a string

I have a requirement of shell script where i need to read the File name i.e ls -t | head -1 and Match that Filename with some delimited values which are in a separate File. For Example i am reading the File name i.e (ls -t | head -1) after that i need to read one more sequential file which... (2 Replies)
Discussion started by: dsdev_123
2 Replies

7. UNIX for Dummies Questions & Answers

Matching string

Hello, i have a program where i have to get a character from the user and check it against the word i have and then replace the character in a blank at the same position it is in the word. (7 Replies)
Discussion started by: nehaquick
7 Replies

8. Shell Programming and Scripting

String matching

for a certain directory, I want to grep a particular file called ABCD so what I do is ls /my/dir | grep -i "ABCD" | awk '{print $9}' however, there is also this file called ABCDEFG, the above command would reurn both file when I only want ABCD, please help! (3 Replies)
Discussion started by: mpang_
3 Replies

9. Shell Programming and Scripting

sed problem - replacement string should be same length as matching string.

Hi guys, I hope you can help me with my problem. I have a text file that contains lines like this: 78 ANGELO -809.05 79 ANGELO2 -5,000.06 I need to find all occurences of amounts that are negative and replace them with x's 78 ANGELO xxxxxxx 79... (4 Replies)
Discussion started by: amangeles
4 Replies

10. Shell Programming and Scripting

matching alphanumeric string

how to match an alphanumeric string like the following. i have to do like the following. if the input line is the data is {clock_91b} i have to replace that with the string was ("clock_91b") i tried like $line =~ s/the data is\s+\{(+)\}/the string was \(\"$1\"\)/ which... (4 Replies)
Discussion started by: sskb
4 Replies

Featured Tech Videos