Sponsored Content
Full Discussion: Python with Regex and Excel
Top Forums Shell Programming and Scripting Python with Regex and Excel Post 302998440 by durden_tyler on Thursday 1st of June 2017 03:30:46 PM
Old 06-01-2017
I do not have openpyxl module in my system, so I tried your regular expression with a simple text file with some dummy data that matches your regex pattern.

Note a couple of things:
1) Characters like ";" and ":" do not have special meaning in a regex, so they need not be escaped by backslash "\" character.
2) The hyphen "-" has special meaning only inside brackets. Otherwise, it need not be escaped.
3) You can reduce the regular expression, but it becomes unreadable very quickly. That's true for any regular expression. So you have to find a balance between readability and succinctness.

Code:
C:\data\>
C:\data\>type testdata.log
Line  1 : Action Performed:
Line  2 : ACTION PERFORMED:
Line  3 : Steps taken to resolve the issue:
Line  4 : steps taken to resolve issue:
Line  5 : Steps Taken to Resolved the issue:
Line  6 : Steps taken to resolve the issue :
Line  7 : Steps taken to resolve:
Line  8 : Steps-
Line  9 : steps -
Line 10 : steps:
Line 11 : Steps taken:
Line 12 : Action taken:
Line 13 : Actions Taken:
Line 14 : Action:
Line 15 : Action -
Line 16 : Action taken -
Line 17 : Actions Taken;
Line 18 : Action Taken :
Line 19 : Action :
Line 20 : Action taken to resolve:
Line 21 : Resolution-
Line 22 : Resolution:
Line 23 : Resolution -
Line 24 : Action taken for resolution:
Line 25 : Solution :
Line 26 : analysis:
Line 27 : analysis :
Line 28 : Investigation:
Line 29 : observed/investigated :

C:\data\>
C:\data\>type processdata.py
import sys
import re
regularexp = "Steps Taken to Resolve(d)*\s*(the)*\s*issue\s*:(.*)|" \
             "Steps taken to resolve:(.*)|" \
             "Steps taken:(.*)|" \
             "steps\s*[:-](.*)|" \
             "Action taken (for|to) resol(ve|ution):(.*)|" \
             "Action(s)* (Performed|Taken)\s*[;:-](.*)|" \
             "Action\s*[:-](.*)|" \
             "(Resolution|Solution|analysis|investigation|observed\/investigated)\s*[:-](.*)"
act_resolution = re.compile(regularexp, re.IGNORECASE)
datafile = sys.argv[1]
fh = open(datafile, 'r')
for line in fh:
    line = line.rstrip("\n")
    if act_resolution.search(line) is not None:
        print line
    else:
        print "[UNMATCHED]>> ", line
fh.close()
 
C:\data\>
C:\data\>python processdata.py testdata.log
Line  1 : Action Performed:
Line  2 : ACTION PERFORMED:
Line  3 : Steps taken to resolve the issue:
Line  4 : steps taken to resolve issue:
Line  5 : Steps Taken to Resolved the issue:
Line  6 : Steps taken to resolve the issue :
Line  7 : Steps taken to resolve:
Line  8 : Steps-
Line  9 : steps -
Line 10 : steps:
Line 11 : Steps taken:
Line 12 : Action taken:
Line 13 : Actions Taken:
Line 14 : Action:
Line 15 : Action -
Line 16 : Action taken -
Line 17 : Actions Taken;
Line 18 : Action Taken :
Line 19 : Action :
Line 20 : Action taken to resolve:
Line 21 : Resolution-
Line 22 : Resolution:
Line 23 : Resolution -
Line 24 : Action taken for resolution:
Line 25 : Solution :
Line 26 : analysis:
Line 27 : analysis :
Line 28 : Investigation:
Line 29 : observed/investigated :
  
C:\data\>
C:\data\>

If you try hard enough, you can reduce that regex to a single string.
But the maintainer of your script may not be very happy about it.

Last edited by durden_tyler; 06-01-2017 at 04:43 PM..
This User Gave Thanks to durden_tyler For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

PERL: Split Excel Workbook to Indiv Excel files

Hi, I am trying to find a way to read an excel work book with multiple worksheets. And write each worksheet into a new excel file using perl. My environment is Unix. For example: I have an excel workbook TEST.xls and it has Sheet1, Sheet2, Sheet3 worksheets. I would like to create... (2 Replies)
Discussion started by: sandeep78
2 Replies

2. Shell Programming and Scripting

Python Regex

I have the below string and regex. However I cant understand why it works the way it does. IP has been changed for safety ;) String = NowSMS Error Report. Error initializing SMSC Interface 'SMPP - 10.15.8.10:17600'. Interface is not available. Regex = (.+\.)\s(.+) I get two... (1 Reply)
Discussion started by: barney34
1 Replies

3. Shell Programming and Scripting

Converting specific Excel file tabs to CSV in Python

Hi list, This is probably something really simple, but I am not particularly familiar with Python so I thought I would ask as I know that python has an excel module. I have an excel document with multiple tabs of data and graphs. One of the tabs is just data which I require to have dumped to... (8 Replies)
Discussion started by: landossa
8 Replies

4. Shell Programming and Scripting

Python Newbie Question Regex

I starting teaching myself python and am stuck on trying to understand why I am not getting the output that I want. Long story short, I am using PDB for debugging and here my function in which I am having my issue: import re ... ... ... def find_all_flvs(url): soup =... (1 Reply)
Discussion started by: metallica1973
1 Replies

5. Shell Programming and Scripting

Perl script to Merge contents of 2 different excel files in a single excel file

All, I have an excel sheet Excel1.xls that has some entries. I have one more excel sheet Excel2.xls that has entries only in those cells which are blank in Excel1.xls These may be in different workbooks. They are totally independent made by 2 different users. I have placed them in a... (1 Reply)
Discussion started by: Anamika08
1 Replies

6. Programming

Python Reading Individual files and Regex through them

As a newbie to Python, I am trying to write a script in which is will add all the log files (*.log) from within a directory to a list, open the files and search for an ip using a regex and single it out (appending the ip's to the list). So far, I have: import re, os def list_files() content = ... (4 Replies)
Discussion started by: metallica1973
4 Replies

7. Shell Programming and Scripting

Python Regex Removing One Too Many...

Well, I'm a python noob and my last post here I was introduced to Regex. I thought this would be easy since I knew Regex with Bash. However, I've been banging my head a while to extract an ip address from ifconfig with this: #!/usr/bin/python import re import subprocess from subprocess... (5 Replies)
Discussion started by: Azrael
5 Replies

8. Programming

Python Regex List Creation

Here is a snippet of my code: blahblahblah... blah for link in goodies.soup.find_all('a'): blah.append(link.get('href')) blah=list(set(blah)) which gives my list of urls. So now I use a regex to search for the relevant urls which I want in a list: for r... (0 Replies)
Discussion started by: metallica1973
0 Replies

9. Shell Programming and Scripting

Convert excel to csv in python date not display exactly

Hi, Anyone can help I am just converting the excel file to csv using python, I can get the conversion output but the date not display exactly. test.xlsx date format 167 1588 risks/SCS JP CAMPANA & CIE.pdf SCS JP CAMPANA & CIE 2 1 1 0 2015-03-16 16:56:25 167 1146 risks/AirBNB... (1 Reply)
Discussion started by: fspalero
1 Replies

10. Programming

Creating filters with Python on excel

Hello, I have an excel sheet with 11 tabs. I need to take data from the first tab and write the output to the second tab. The first tab looks like this, starting from Row 3 The filters that needs to be created are 1) keep anything greater than 'POS' 5 and less than 160 AND 2)... (2 Replies)
Discussion started by: nans
2 Replies
All times are GMT -4. The time now is 05:24 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy