awk or sed or python for regular expressions ?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk or sed or python for regular expressions ?
# 1  
Old 08-28-2015
awk or sed or python for regular expressions ?

Linux 6.X environments (RHEL, Oracle Linux )

I could write basic shell scripts in bash.
In my spare time, I was planning to learn awk or sed to deal with regular expression tasks I have to deal with. But, I gather that python is gaining popularity these days and I came to know that python has a decent regular expression functionality.

I haven't worked with awk , sed or python before and I want to learn one of these to deal with regular expression tasks. So, I would like hear from you guys on which is powerful and simple .
# 2  
Old 08-28-2015
Most, if not all, coding languages have at least some sort of 'REGEX' = Regular Expressions'.

And the basic REGEX is usualy the same, at least on POSIX systems.
Some tools have/provide their own Extended REGEX.

Once you got the basic REGEX understood, there is not much of a difference with AWK or SED, as they are both CLU's.
Python however, is a programming language, and therefor much more powerfull (in a general apporach/usage carease) than the other two.

Know thy tools.
Awk is almost a programming language, stream editor.
Python is a programming language.
Sed is a tool to add/replace strings, stream editor

hth
This User Gave Thanks to sea For This Post:
# 3  
Old 08-28-2015
To a degree I agree with sea.

It all depends on whether you are an amateur or (semi-)professional at this coding lark.

Awk is a programming language with the ability to stream...

I started coding in Python from version 1.4.0 for tha AMIGA A1200 to current versions on various platforms.

As it has evolved it has become a serious contender for THE definitive scripting language with a plethora of libraries for just about everything you need to do in the computing world.

However I switched to UNIX scripting about 2.5 years ago and all but abandoned other languages, including Python, as shell scripting, IMO, is just SOOOOO flexible.

Yes it is cumbersome at times and is slow at doing some tasks but if you can shell script then expand on it and try some serious app of your own to find its limits. I keep finding what I consider a limit at something and then the big guns on here often show me the way in/out...

Regular expressions are part of the shell scripting make-up and with the extra tools, e.g. grep, sed and the like you have just about everything you need.

I know that some big guns on here will probably disagree with me on some points but I do look at it from an amateur POV, not professional...

Hope this helps...
This User Gave Thanks to wisecracker For This Post:
# 4  
Old 08-28-2015
Python is a language with a severe case of library-itis. Everything is libraries to such an absurd extreme that Python code is seldom portable unless you habitually use the exact same libraries and version as someone else. Otherwise you're in for a few rounds of installation whack-a-mole.

It also has an eyebrow-raising amount of overhead, witness the number of opened files it takes for various languages to sit there and do absolutely nothing:

Code:
$ strace python /dev/null 2>&1 | grep "^open" | egrep -v 'ENOENT|[.]so' | wc -l
42

$ strace php /dev/null 2>&1 | grep "^open" | egrep -v 'ENOENT|[.]so' | wc -l
6

$ strace bash /dev/null 2>&1 | grep "^open" | egrep -v 'ENOENT|[.]so' | wc -l
5

$ strace perl /dev/null 2>&1 | grep "^open" | egrep -v 'ENOENT|[.]so' | wc -l
3

$ strace awk '{}' /dev/null 2>&1 | grep "^open" | egrep -v 'ENOENT|[.]so' | wc -l
3

$

I would suggest starting with awk for a few reasons. For one thing it's a small, relatively simple language, it's native to UNIX everywhere (unlike perl and python), and for the most part quite portable (use nawk on solaris). sed is useful sometimes but not actually a language (some argue otherwise, but it's pretty far down in the turing tarpit).

Last edited by Corona688; 08-28-2015 at 12:35 PM..
This User Gave Thanks to Corona688 For This Post:
# 5  
Old 08-28-2015
Hi.

As with many aspects of life, it depends.

I agree with Corona wrt Python. I bought the 5th edition of Learning Python earlier this year -- 1500 pages !

However, for pros, it is growing in use -- it's hard to beat a number-4-slot: Interactive: The Top Programming Languages 2015 - IEEE Spectrum

When I was designing and presenting training courses, I included awk in the Intermediate unix-like classes (Solaris, Linux, etc.). It is easy to use, fast, and you will get a few of the ideas of C from the advanced syntax. In Solaris 11, Oracle has provided gawk (3.1.8) in their repository (plus a number of other GNU utilities).

I tend to dislike Java for the same reason that Corona mentioned -- our Java instructor often commented that it was the only language that one needed a library manual open while coding. However, Java was designed with portability as a primary concern -- write once, run anywhere.

For writing unix-like utilities without resorting to C/C++, I usually use perl. The Intro to perl courses, were usually 3-4 8-hour days with a LOT of hands-on, as was the Intermediate Unix.

If you are going to be working with big data, extraordinarily long lines, or have compute-intensive operations, you may need other tools, but for most people, awk is the right tool for most things. When a task calls for anything to do with fields, I usually reach for awk, if there is not a utility that will do the job.

Good luck ... cheers, drl
This User Gave Thanks to drl For This Post:
# 6  
Old 08-29-2015
Thanks Sea, Wisecracker, Corona, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Issue with sed and regular expressions

I have a file, each line has the date and time twice, once at the start of the line, and again half way along. to neaten things up, and to make it easier to read i'm removing one set. Wasn't as easy as identify the text and remove, as it'd remove both. So i added some text at the beginning of... (4 Replies)
Discussion started by: chr15b
4 Replies

2. Programming

Which language is best suited for regular expressions perl,python.ruby ?

Hello all, i am in a bit of dilema here. i dont know any thing about perl or python. only know a little bit of awk. now unable to take a decission as to which language to go for. my requirement is building a testing framework.suite which will execute ssytem comands remotely on unix... (2 Replies)
Discussion started by: achak01
2 Replies

3. Shell Programming and Scripting

sed and regular expressions

Hi, There's a bug using JavaDoc that generates an error if a tag <a...> is found in a javadoc comment, which is not a HTML reference. For example this error is produced with generics. I want to insert an space between "<" and "a". Expression is able to find where this happens using find and grep: ... (6 Replies)
Discussion started by: AlbertGM
6 Replies

4. Shell Programming and Scripting

SED regular expressions

Hi, I need to replace <field name="ID">1</field> with <field name="STATION_ID">01</field> how can i do it? (3 Replies)
Discussion started by: noppeli
3 Replies

5. Shell Programming and Scripting

Awk regular expressions

Hi Experts, Can you please help me out for the below scenario, I have a variable length file with the fixed number of columns, in which the fields are delimited by pipe symbol (|). From that file I have to extract the lines which has the following scenario, The field1 in a... (1 Reply)
Discussion started by: chella
1 Replies

6. Shell Programming and Scripting

SED: Print range, exclude regular expressions.

Ok, so I get that: sed -n '/START/,/END/p' file ...will print every line from START to END inclusive, but I don't want to see START or END. Apart from the obious: sed -n '/START/,/END/p' file | grep -v "START" | grep -v "END" ...is there a simpler way of doing this? Thanks as always! (2 Replies)
Discussion started by: cs03dmj
2 Replies

7. Shell Programming and Scripting

sed and regular expressions problem

Hi Im trying to use sed to change some files which I'll describe here: I want to use a regular expression to grab the <body> tag from a document. However, the <body> tag can look different so the regular expression used will take care of that and "include" all types of bodies, in example:... (4 Replies)
Discussion started by: hjalle
4 Replies

8. UNIX for Dummies Questions & Answers

regular expressions variables in sed

I am trying to pass a regular expression variable from a simple script to sed to remove entries from a text file e.g. a='aaaa bbbb cccc ...|...:' then executing sed from the script sed s'/"'$a"'//g <$FILE > $FILE"_"1 my output file is always the same as the input file !! any... (1 Reply)
Discussion started by: Daniel234
1 Replies

9. Shell Programming and Scripting

Regular expressions in sed

I'm using sed to alter a parameter file used in another process. Basically, the file is a template containing a few variables which need to be replaced at runtime. The problem is that using sed with filenames that contain the / character causes matches to fail. I've tried doing an escaped... (2 Replies)
Discussion started by: mfreemantle
2 Replies

10. UNIX for Dummies Questions & Answers

Regular expressions in sed

I'm using sed to alter a parameter file used in another process. Basically, the file is a template containing a few variables which need to be replaced at runtime. The problem is that using sed with filenames that contain the / character causes matches to fail. eg:... (3 Replies)
Discussion started by: mfreemantle
3 Replies
Login or Register to Ask a Question