Split content based on keywords


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Split content based on keywords
# 1  
Old 03-12-2019
Split content based on keywords

I need to split the file contents with multiple rows based on patterns

Sample:
Input:
Code:
ABC101testXYZ102UKMNO1092testing
ABC999testKMNValid

Output:
Code:
ABC101test
XYZ102U
KMN1092testing
ABC999test
KMNValid

In this ABC , XYZ and KMN are patterns

Last edited by Jairaj; 03-12-2019 at 05:41 AM..
# 2  
Old 03-12-2019
the last my example is not entirely correct
Code:
sed 's/ABC\|XYZ\|KMN/\n&/g;s/^\n//' file

--- Post updated at 13:45 ---

And the first one is better to correct Smilie
Code:
sed -r 's/\B(ABC|XYZ|KMN)/\n&/g' file

# 3  
Old 03-12-2019
It's working.Thanks !

Can you tell me how this statement(coomand) flow will work ?
# 4  
Old 03-12-2019
Hello Jairaj,

In awk, could you please try following.
Code:
awk '{gsub("ABC|XYZ|MNO|KMN",ORS"&");sub(/^\n/,"")} 1'  Input_file

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 5  
Old 03-13-2019
It's working.Thanks !

Can you tell me how this statement(coomand) flow will work ?
# 6  
Old 03-13-2019
Hi Jairaj,
I'm sorry, I have problems with English, I can not.
Enter this command in the terminal
Code:
LESS=+/" *s/regexp/replacement/" man sed

# 7  
Old 03-14-2019
Quote:
Originally Posted by nezabudka
I'm sorry, I have problems with English, I can not.
If i may try?

Code:
sed 's/ABC\|XYZ\|KMN/\n&/g;s/^\n//' file

This sed-program consists of two statements which are applied one after the other to every line:

Code:
s/ABC\|XYZ\|KMN/\n&/g
s/^\n//

Let us start with the second one as it is easier: it is a "replacement" command and replaces one expression with another. Actually the "s" stands for "substitute":

Code:
s/<something to match>/<something that replaces what was matched>/

What does it replace? It replaces a start-of-line (^) followed by a newline character (\n) with nothing. The start-of-line is not really a character, so effectively it deletes a newline character, should it follow a line start but no other newline characters.

The first line is a bit more complicated: basically it is a replacement command too and works the same way as the second line. Now, what does it replace?

Code:
/ABC\|XYZ\|KMN/

This matches one of the strings separated by the escaped pipe-characters, so effectively it matches either "ABC" or "XYZ" or "KMN". Now, what will these strings be replaced with?

Code:
/\n&/

The first is a \n, which means a newline character. The second character, &, means what has been matched before. As i said the first expression will match one of three different strings. The string which was matched in the first expression is put here so effectively it replaces the string with itself plus a newline character up front.

The final g is just an option and says that the operation should occur as often as possible and not only for the first opportunity. If you have a substitution command like:

Code:
s/a/b/

It will replace "a" with "b" but only the first occurence of "a". An input string of "aaa" will become "baa", but with the "g" in place it will become "bbb" because all the "a"s will be replaced, not only the first one. So, to put it all together, this is waht will happen to an input string:

Code:
# input string:
ABC101testXYZ102UKMNO1092testing

# after first command (newlines are encoded as "\n" for better understanding):
\nABC101test\nXYZ102U\nKMNO1092testing

# after the second command:
ABC101test\nXYZ102U\nKMNO1092testing

# what will really be written (newlines not encoded any more):
ABC101test
XYZ102U
KMNO1092testing


Quote:
Originally Posted by nezabudka
Code:
sed 's/ABC\|XYZ\|KMN/\n&/g;s/^\n//' file

Code:
sed -r 's/\B(ABC|XYZ|KMN)/\n&/g' file

Notice that the use of Extended Regular Expressions as well as the usage of "\n" as a newline character is not covered by a standard-conforming sed.

There are several (similar but not identical) regular expression engines used in UNIX/Linux:

The most basic "regular expressions" although they are usually called "file globs" are used by the shell: i.e. the expression filename* where "*" is expanded to any string of any length is an example of this regexp syntax.

Then there are Basic Regular Expressions or "BRE"s. The syntax of BREs is standardized by POSIX and is used in utilities like sed, grep (in its default mode, see below) and so on.

Notice that the GNU project deviated from this standard and developed their own variant of BREs, the GNU Basic Regular Expressions. The GNU variants of sed, grep and so on use these instead of the POSIX BREs. One example for the difference between the GNU-BREs and POSIX-BREs is the quantifier "+", which means "one or more (of the previous expression". For instance, the regexp:

Code:
/Xa*Y/

will match "XaY", "XaaY" and so on, but also "XY". To exclude that latter and restrict the pattern to one or more "a" you would need to write

Code:
/Xaa*Y/         # POSIX, variant 1
/Xa\{1,\}Y/     # POSIX, variant 2
/Xa+Y/          # GNU

Notice that the two POSIX variants are understood by all regexp engines, the GNU variant is understood only by GNU-tools.

Then there are Extended Regular Expression or EREs. EREs are basically a superset of BREs but with a few quirks. For instance you do not escape grouping or numerical quantifiers:

Code:
/Xa\{1,\}Y/     # BRE
/Xa{1,}Y/       # ERE
/X\(abc\)*Y/       # BRE
/X(abc)*Y/         # ERE

There is a POSIX standard for these and they are used in utilities like awk, grep -E (the -E option switches the used regexp engine from BRE to ERE), egrep (this is basically a grep with the -E option set and fixed) and so on.

Again, GNU has its own variant of ERE called GNU-ERE and used in the respective GNU variants of GNU-awk, GNU-egrep, etc. but also GNU-sed when used with the "-E" or the equivalent "-r"-switch.

I hope this helps.

bakunin
These 2 Users Gave Thanks to bakunin For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split content based on keywords

I need to split the file contents with multiple rows based on patterns Sample: Input: ABC101testXYZ102UKMNO1092testing ABC999testKMNValid Output: ABC101test XYZ102U KMN1092testing ABC999test KMNValid In this ABC , XYZ and KMN are patterns Continue here./mod] Please read forum... (1 Reply)
Discussion started by: Jairaj
1 Replies

2. Shell Programming and Scripting

awk to print line based on two keywords

I am starting to write a multi-line awk and using the file below which is tab-delimited, print only the line with oncomineGeneClass and oncomineVariantClass and PASS. The script execute but seems to be printing the entire file, not the desired line. Thank you :). file ... (8 Replies)
Discussion started by: cmccabe
8 Replies

3. Shell Programming and Scripting

Split a file in more files based on score content

Dear All, I have the following file tabulated: ID distanceTSS score 8434 571269 10 10122 393912 9 7652 6 10 4863 1451 9 8419 39 2 9363 564 21 9333 7714 22 9638 8334 9 1638 1231 11 10701 918 1000 6587 32056 111 What I would like to do is the following, create 100 new files based... (5 Replies)
Discussion started by: paolo.kunder
5 Replies

4. Shell Programming and Scripting

Extracting words and lines based on keywords

Hello! I'm trying to process a text file and am stuck at 2 extractions. Hoping someone can help me here: 1. Given a line in a text file and given a keyword, how can I extract the word preceeding the keyword using a shell command/script? For example: Given a keyword "world" in the line: ... (2 Replies)
Discussion started by: seemad
2 Replies

5. Shell Programming and Scripting

Split the file based on the content

Arun kumar something somehting Enterting in to the line . . . . Some text text Finshing the sentence Some other text . . . . Again something somehting Enterting in to the line . . . . . . Again text text Finshing the sentence (6 Replies)
Discussion started by: arukuku
6 Replies

6. Shell Programming and Scripting

copy range of lines in a file based on keywords from another file

Hi Guys, I have the following problem. I have original file (org.txt) that looks like this module v_1(.....) //arbitrary number of text lines endmodule module v_2(....) //arbitrary number of text lines endmodule module v_3(...) //arbitrary number of text lines endmodule module... (6 Replies)
Discussion started by: kaaliakahn
6 Replies

7. Shell Programming and Scripting

Sorting lines based on keywords for MySQL script

the thing which i require is very very complex.. i tried hard to find the solution but couldnt.. the thing i need to achieve is say i have a file cat delta.sql CREATE VIEW Austin Etc etc . . . CREATE VIEW Barabara AS SELECT blah blah blah FROM Austin z, Cluster s, Instance i WHERE... (4 Replies)
Discussion started by: vivek d r
4 Replies

8. UNIX for Advanced & Expert Users

Forwarding based on keywords in sendmail

I have an application that runs on the server with root privileges and all emails it sends get sent to root (errors, logs, etc), when they should actually go to one of application admins. I would like to separate these emails from the OS related one sent to root and forward them to that... (2 Replies)
Discussion started by: vostrushka
2 Replies

9. Shell Programming and Scripting

How to keep appending a newly created file based on some keywords

Hi Friends, I have to create a new log file everyday and append it with content based on some keywords found in another log file. Here is what I have tried so far... grep Error /parentfolder/someLogFile.log >> /parentfolder /Archive/"testlogfile_error_`date '+%d%m%y'`.txt" grep error... (6 Replies)
Discussion started by: supreet
6 Replies

10. Shell Programming and Scripting

Capture lines based on keywords

Hello everyone, I am trying to write a script that will capture few lines from a text file based on 2 keywords in the first line and 1 keyword in the last one. It could also be based on the first line only + the folllowing 3 lines. Could some one help or give directions. Thanks. (4 Replies)
Discussion started by: nimo
4 Replies
Login or Register to Ask a Question