Command to read between patterns in a while


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Command to read between patterns in a while
# 1  
Old 01-31-2017
Display Command to read between patterns in a while

I am currently working on a requirement in a file wher I have to filter the characters between two specific fields/patters and get the count of total no of characters between the two fields.


REQUIREMENT:

The below content is in a file
1. I have to get the no of characters between each instance starting with <test> and </test1> throughout the file.
2. With the no of characters obtained,
Find
No of tags which has more than 30 characters between them
No of tags which has less than 30 characters between t hem.



Ex:

Code:
<test>12312 njh</test1>
<test>abcdedfg
hijklmno
abched</test1>

I tried to perform this with sed command but I am unable to get an output.



Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 01-31-2017 at 10:08 AM.. Reason: Added CODE tags.
# 2  
Old 01-31-2017
Please show your attempts with sed so we can give you a hand.
# 3  
Old 01-31-2017
Hi Rudic,
Actually the input file[input.txt] had tags other than <test></test1>

ex :
Code:
<test>12312 njh</test1>
<tag1>apple</tag1>
<tag2>orange</tag2>
<test>abcdedfg
hijklmno
abched</test1>
<test>apple ball
cat orange
pineapple
mango</test1>

I used "
Code:
sed - n '/<test>/,/<\/test1>/{s/<tag1>.*//;s/<tag2>.*//;p;}' input.txt

So got only the required tags in the output, but I am not able to count the characters between tags as some tags have newline in between and some tags don't have. I am a newbie to Shell. Kindly, please help

Expected output is :
Code:
0-30 char : ?
>30 characters :?



Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by venkidhadha; 01-31-2017 at 10:46 AM.. Reason: Added code
# 4  
Old 01-31-2017
Hello venkidhadha,

Based on your Input_file/samples shown, could you please try following and let me know if this helps you.
Code:
awk -vST=": TAG has characters replacement count is: " '($0 ~ /<test>/ && $0 ~ /<\/test1>/){gsub(/<test>|<\/test1>/,"");print ++instance ST gsub(/[a-zA-Z]/,"");if(Q>30){MAX++} else {MIN++};next} ($0 ~ /<test>/){A=1;sub(/<test>/,"");Q+=gsub(/a-zA-Z]/,"")} ($0 ~ /<\/test1>/ && A){A="";sub(/<\/test1>/,"");print ++instance ST Q+gsub(/[a-zA-Z]/,"");if(Q>30){MAX++} else {MIN++};next} A{Q+=gsub(/[a-zA-Z]/,"")} END{printf("%s%01d\n%s%01d\n","Number of tags having more than 30 replacement of characters are: ",MAX,"Number of tags having less than 30 replacement of characters are: ",MIN)}'    Input_file

Output will be as follows.
Code:
1: TAG has characters replacement count is: 3
2: TAG has characters replacement count is: 22
Number of tags having more than 30 replacement of characters are: 0
Number of tags having less than 30 replacement of characters are: 2

EDIT: Adding a non-one liner form of solution too now successfully.
Code:
awk -vST=": TAG has characters replacement count is: " '
                                                        ($0 ~ /<test>/ && $0 ~ /<\/test1>/){
                                                                                            gsub(/<test>|<\/test1>/,"");
                                                                                            print ++instance ST gsub(/[a-zA-Z]/,"");
                                                                                            if(Q>30){
                                                                                                     MAX++
                                                                                                    }
                                                                                            else    {
                                                                                                     MIN++
                                                                                                    };
                                                                                            next
                                                                                           }
                                                        ($0 ~ /<test>/)                    {
                                                                                            A=1;
                                                                                            sub(/<test>/,"");
                                                                                            Q+=gsub(/a-zA-Z]/,"")
                                                                                           }
                                                        ($0 ~ /<\/test1>/ && A)            {
                                                                                            A="";
                                                                                            sub(/<\/test1>/,"");
                                                                                            print ++instance ST Q+gsub(/[a-zA-Z]/,"");
                                                                                            if(Q>30){
                                                                                                     MAX++
                                                                                                    }
                                                                                            else    {
                                                                                                     MIN++
                                                                                                    };
                                                                                            next
                                                                                           }
                                                        A                                  {
                                                                                            Q+=gsub(/[a-zA-Z]/,"")
                                                                                           }
    END{
        printf("%s%01d\n%s%01d\n","Number of tags having more than 30 replacement of characters are: ",MAX,"Number of tags having less than 30 replacement of characters are: ",MIN)}
                                                       '  Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 02-01-2017 at 02:23 AM.. Reason: Adding a non-one liner form of solution too now successfully.
# 5  
Old 01-31-2017
Code:
awk '
/<\// {sub("</[^>]*>", ""); sub("<[^>]*>", ""); s=s $0; (length(s) <= 30) ? l30++ : g30++; s=""}
{sub("<[^>]*>", ""); sub("</[^>]*>", ""); s=s $0}
END {print "0-30 char : " l30;
     print ">30 characters : " g30;
}' input.txt

# 6  
Old 01-31-2017
I'm afraid sed (alone) can't do that, as it can't calculate nor count. On top, your request is not quite clear - does the term "char" as used by you include digits and punctuation etc, or not? Please specify. If all that is included, try a combination like
Code:
sed ':L; $ {s/\n//g;
            s/<\/test1>/\n/g
            s/<test>\|<tag.>.*<\/tag.>//g
            s/\n$//}
      N; bL
' file |
{ while read LN
     do [ ${#LN} -ge 30 ] && A=$((A+1)) || U=$((U+1))
     done
     echo "0 - 30 char: " $U
     echo "  > 30 char: " $A
}
0 - 30 char:  2
  > 30 char:  1

# 7  
Old 02-01-2017
Quote:
Originally Posted by RudiC
I'm afraid sed (alone) can't do that, as it can't calculate nor count. On top, your request is not quite clear - does the term "char" as used by you include digits and punctuation etc, or not? Please specify. If all that is included, try a combination like
Code:
sed ':L; $ {s/\n//g;
            s/<\/test1>/\n/g
            s/<test>\|<tag.>.*<\/tag.>//g
            s/\n$//}
      N; bL
' file |
{ while read LN
     do [ ${#LN} -ge 30 ] && A=$((A+1)) || U=$((U+1))
     done
     echo "0 - 30 char: " $U
     echo "  > 30 char: " $A
}
0 - 30 char:  2
  > 30 char:  1

Just tried for fun Smilie

Code:
$ sed ':L; $ {s/\n//g;
>             s/<\/test1>/\n/g
>             s/<test>\|<tag.>.*<\/tag.>//g
>             s/\n$//}
>       N; bL
> ' f | sed -n "s/^.\{1,30\}$/lt/p" | sed -n "/lt/{$ =;}"
2
$ sed ':L; $ {s/\n//g;
>             s/<\/test1>/\n/g
>             s/<test>\|<tag.>.*<\/tag.>//g
>             s/\n$//}
>       N; bL
> ' f | sed -n "s/^.\{31,\}$/gt/p" | sed -n "/gt/{$ =;}"
1

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk/sed command to extract the string between 2 patterns but having some particular value

Hi - i have one file with content as below. ***** BEGIN 123 ***** BASH is awesome ***** END ***** ***** BEGIN 365 ***** KSH is awesome ***** END ***** ***** BEGIN 157 ***** KSH is awesome ***** END ***** ***** BEGIN 7123 ***** C is awesome ***** END ***** I am trying to find all... (4 Replies)
Discussion started by: reldb
4 Replies

2. Shell Programming and Scripting

Bash - Find files excluding file patterns and subfolder patterns

Hello. For a given folder, I want to select any files find $PATH1 -f \( -name "*" but omit any files like pattern name ! -iname "*.jpg" ! -iname "*.xsession*" ..... \) and also omit any subfolder like pattern name -type d \( -name "/etc/gconf/gconf.*" -o -name "*cache*" -o -name "*Cache*" -o... (2 Replies)
Discussion started by: jcdole
2 Replies

3. Shell Programming and Scripting

Find matched patterns and print them with other patterns not the whole line

Hi, I am trying to extract some patterns from a line. The input file is space delimited and i could not use column to get value after "IN" or "OUT" patterns as there could be multiple white spaces before the next digits that i need to print in the output file . I need to print 3 patterns in a... (3 Replies)
Discussion started by: redse171
3 Replies

4. Shell Programming and Scripting

Read from file and execute the read command

Hi, I am facing issues with the below: I have a lookup file say lookup.lkp.This lookup.lkp file contains strings delimited by comma(,). Now i want to read this command from file and execute it. So my code below is : Contents in the lookup.lkp file is : c_e,m,a,`cd $BOX | ls cef_*|tail... (7 Replies)
Discussion started by: vital_parsley
7 Replies

5. Shell Programming and Scripting

Any way to "alias" file patterns for use in a command?

First, I apologize for my 'noobness' with Linux and the shell. I'm running Ubuntu with zsh as my shell. What I'd like to be able to do is clean up a messy Downloads folder by moving categories of files to different directories with something like: mv dir/$vids dest mv dir/$music dest mv... (5 Replies)
Discussion started by: Apollo33
5 Replies

6. Shell Programming and Scripting

to read two files, search for patterns and store the output in third file

hello i have two files temp.txt and temp_unique.text the second file consists the unique fields from the temp.txt file the strings stored are in the following form 4,4 17,12 15,65 4,4 14,41 15,65 65,89 1254,1298i'm able to run the following script to get the total count of a... (3 Replies)
Discussion started by: vaibhavkorde
3 Replies

7. UNIX for Dummies Questions & Answers

read command - using output from command substitution

Hey, guys! Trying to research this is such a pain since the read command itself is a common word. Try searching "unix OR linux read command examples" or using the command substitution keyword. :eek: So, I wanted to use a command statement similar to the following. This is kinda taken... (2 Replies)
Discussion started by: ProGrammar
2 Replies

8. UNIX for Dummies Questions & Answers

tar/cpio/pax read patterns from stdin

tar has the -T operand for reading patterns from a file. Is there any way to read patterns from stdin, without creating a temp file? I would like to avoid iterating over the archive repeatedly (e.g. with a loop or xargs) as this is a large archive and we're only extracting a small number of... (2 Replies)
Discussion started by: uiop44
2 Replies

9. Shell Programming and Scripting

Searching patterns in 1 file and deleting all lines with those patterns in 2nd file

Hi Gurus, I have a file say for ex. file1 which has 3500 lines in it which are different account numbers and another file (file2) which has 230000 lines in it. I want to read all the lines in file1 and delete all those lines from file2 which has that same pattern as in file1. I am not quite... (4 Replies)
Discussion started by: toms
4 Replies

10. Shell Programming and Scripting

Can we give multiple patterns to a sed command???

HI, I want to know can multiple pattern be given inside a sed statement.. Input: aa_bb_cc.Mar-22-2007 cc_dd_ee.Mar-21-2007 aa_1002985_952.xml aa_bb_032207.txt aa_bb_cc_10002878.dat Output: aa_bb_cc cc_dd_ee aa.xml aa_bb.txt aa_bb_cc.dat (6 Replies)
Discussion started by: kumarsaravana_s
6 Replies
Login or Register to Ask a Question