Today (Saturday) We will make some minor tuning adjustments to MySQL.

You may experience 2 up to 10 seconds "glitch time" when we restart MySQL. We expect to make these adjustments around 1AM Eastern Daylight Saving Time (EDT) US.


Split content based on keywords


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Split content based on keywords

I need to split the file contents with multiple rows based on patterns

Sample:
Input:
Code:
ABC101testXYZ102UKMNO1092testing
ABC999testKMNValid

Output:
Code:
ABC101test
XYZ102U
KMN1092testing
ABC999test
KMNValid

In this ABC , XYZ and KMN are patterns

Last edited by Jairaj; 03-12-2019 at 05:41 AM..
# 2  
the last my example is not entirely correct
Code:
sed 's/ABC\|XYZ\|KMN/\n&/g;s/^\n//' file

--- Post updated at 13:45 ---

And the first one is better to correct Smilie
Code:
sed -r 's/\B(ABC|XYZ|KMN)/\n&/g' file

# 3  
It’s working.Thanks !

Can you tell me how this statement(coomand) flow will work ?
# 4  
Hello Jairaj,

In awk, could you please try following.
Code:
awk '{gsub("ABC|XYZ|MNO|KMN",ORS"&");sub(/^\n/,"")} 1'  Input_file

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 5  
It’s working.Thanks !

Can you tell me how this statement(coomand) flow will work ?
# 6  
Hi Jairaj,
I'm sorry, I have problems with English, I can not.
Enter this command in the terminal
Code:
LESS=+/" *s/regexp/replacement/" man sed

# 7  
Quote:
Originally Posted by nezabudka
I'm sorry, I have problems with English, I can not.
If i may try?

Code:
sed 's/ABC\|XYZ\|KMN/\n&/g;s/^\n//' file

This sed-program consists of two statements which are applied one after the other to every line:

Code:
s/ABC\|XYZ\|KMN/\n&/g
s/^\n//

Let us start with the second one as it is easier: it is a "replacement" command and replaces one expression with another. Actually the "s" stands for "substitute":

Code:
s/<something to match>/<something that replaces what was matched>/

What does it replace? It replaces a start-of-line (^) followed by a newline character (\n) with nothing. The start-of-line is not really a character, so effectively it deletes a newline character, should it follow a line start but no other newline characters.

The first line is a bit more complicated: basically it is a replacement command too and works the same way as the second line. Now, what does it replace?

Code:
/ABC\|XYZ\|KMN/

This matches one of the strings separated by the escaped pipe-characters, so effectively it matches either "ABC" or "XYZ" or "KMN". Now, what will these strings be replaced with?

Code:
/\n&/

The first is a \n, which means a newline character. The second character, &, means what has been matched before. As i said the first expression will match one of three different strings. The string which was matched in the first expression is put here so effectively it replaces the string with itself plus a newline character up front.

The final g is just an option and says that the operation should occur as often as possible and not only for the first opportunity. If you have a substitution command like:

Code:
s/a/b/

It will replace "a" with "b" but only the first occurence of "a". An input string of "aaa" will become "baa", but with the "g" in place it will become "bbb" because all the "a"s will be replaced, not only the first one. So, to put it all together, this is waht will happen to an input string:

Code:
# input string:
ABC101testXYZ102UKMNO1092testing

# after first command (newlines are encoded as "\n" for better understanding):
\nABC101test\nXYZ102U\nKMNO1092testing

# after the second command:
ABC101test\nXYZ102U\nKMNO1092testing

# what will really be written (newlines not encoded any more):
ABC101test
XYZ102U
KMNO1092testing


Quote:
Originally Posted by nezabudka
Code:
sed 's/ABC\|XYZ\|KMN/\n&/g;s/^\n//' file

Code:
sed -r 's/\B(ABC|XYZ|KMN)/\n&/g' file

Notice that the use of Extended Regular Expressions as well as the usage of "\n" as a newline character is not covered by a standard-conforming sed.

There are several (similar but not identical) regular expression engines used in UNIX/Linux:

The most basic "regular expressions" although they are usually called "file globs" are used by the shell: i.e. the expression filename* where "*" is expanded to any string of any length is an example of this regexp syntax.

Then there are Basic Regular Expressions or "BRE"s. The syntax of BREs is standardized by POSIX and is used in utilities like sed, grep (in its default mode, see below) and so on.

Notice that the GNU project deviated from this standard and developed their own variant of BREs, the GNU Basic Regular Expressions. The GNU variants of sed, grep and so on use these instead of the POSIX BREs. One example for the difference between the GNU-BREs and POSIX-BREs is the quantifier "+", which means "one or more (of the previous expression". For instance, the regexp:

Code:
/Xa*Y/

will match "XaY", "XaaY" and so on, but also "XY". To exclude that latter and restrict the pattern to one or more "a" you would need to write

Code:
/Xaa*Y/         # POSIX, variant 1
/Xa\{1,\}Y/     # POSIX, variant 2
/Xa+Y/          # GNU

Notice that the two POSIX variants are understood by all regexp engines, the GNU variant is understood only by GNU-tools.

Then there are Extended Regular Expression or EREs. EREs are basically a superset of BREs but with a few quirks. For instance you do not escape grouping or numerical quantifiers:

Code:
/Xa\{1,\}Y/     # BRE
/Xa{1,}Y/       # ERE
/X\(abc\)*Y/       # BRE
/X(abc)*Y/         # ERE

There is a POSIX standard for these and they are used in utilities like awk, grep -E (the -E option switches the used regexp engine from BRE to ERE), egrep (this is basically a grep with the -E option set and fixed) and so on.

Again, GNU has its own variant of ERE called GNU-ERE and used in the respective GNU variants of GNU-awk, GNU-egrep, etc. but also GNU-sed when used with the "-E" or the equivalent "-r"-switch.

I hope this helps.

bakunin
These 2 Users Gave Thanks to bakunin For This Post:
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Split content based on keywords
Jairaj
I need to split the file contents with multiple rows based on patterns Sample: Input: ABC101testXYZ102UKMNO1092testing ABC999testKMNValid Output: ABC101test XYZ102U KMN1092testing ABC999test KMNValid In this ABC , XYZ and KMN are patterns Continue here./mod] Please read forum...... Shell Programming and Scripting
1
Shell Programming and Scripting
Split a file in more files based on score content
paolo.kunder
Dear All, I have the following file tabulated: ID distanceTSS score 8434 571269 10 10122 393912 9 7652 6 10 4863 1451 9 8419 39 2 9363 564 21 9333 7714 22 9638 8334 9 1638 1231 11 10701 918 1000 6587 32056 111 What I would like to do is the following, create 100 new files based...... Shell Programming and Scripting
5
Shell Programming and Scripting
Split the file based on the content
arukuku
Arun kumar something somehting Enterting in to the line . . . . Some text text Finshing the sentence Some other text . . . . Again something somehting Enterting in to the line . . . . . . Again text text Finshing the sentence... Shell Programming and Scripting
6
Shell Programming and Scripting
Forwarding based on keywords in sendmail
vostrushka
I have an application that runs on the server with root privileges and all emails it sends get sent to root (errors, logs, etc), when they should actually go to one of application admins. I would like to separate these emails from the OS related one sent to root and forward them to that...... UNIX for Advanced & Expert Users
2
UNIX for Advanced & Expert Users
Capture lines based on keywords
nimo
Hello everyone, I am trying to write a script that will capture few lines from a text file based on 2 keywords in the first line and 1 keyword in the last one. It could also be based on the first line only + the folllowing 3 lines. Could some one help or give directions. Thanks.... Shell Programming and Scripting
4
Shell Programming and Scripting

Featured Tech Videos