I have python script that pulls out a keyword from the data set. The data set contains 3 columns,
1. SysID 2. ID 3. Comment Section.
This script just pulls out keyword for certain extent from Comment section and display only keyword, not any other columns.
Can someone help out to alter this script so that script trim comment column sparing with precise key words from each row of columns, without truncating the other columns.
Code:
#!/usr/bin/env python2.7
import numpy as np
from collections import Counter
import csv
class Preprocess_data():
def __init__(self, data, k_number_of_features=5000):
self.k = k_number_of_features
self.words = zip(*data)[2]
def get_word(self, data):
punc1 = ("~`!@#$%^&*()_-+=[]{}\|;:',<.>/?")
punc2 = ('"')
wordsbag = []
words = zip(*data)[2]
words = [item.lower().translate(None, punc1).translate(None, punc2) for item in words]
self.words = [item.split() for item in words]
for line in self.words:
wordsbag.extend(set(line))
return wordsbag
def count_attr(self,data):
c = Counter(self.get_word(data))
feature = c.most_common(100+self.k)[100:100+self.k]
return feature
def summarize_feature(self, data):
words = self.words
feature = self.count_attr(data)
feature_value = np.zeros((len(data), len(feature)))
for i in range(len(words)):
for j in range(len(feature)):
if (feature[j][0] in words[i]):
feature_value[i][j] = 1
else:
feature_value[i][j] = 0
return feature_value
if __name__=='__main__':
file = open('testfile', 'rU')
data = list(csv.reader(file, delimiter='\t'))
preprocessed = Preprocess_data(data, k_number_of_features='n')
wordsbag = preprocessed.get_word(data)
feature = preprocessed.count_attr(data)
feature_value = preprocessed.summarize_feature(data)
#-------print the most common ten words---------#
for i in range(3000):
print 'WORD' + str(i+1), feature[i][0]
Sample Dataset
Code:
4819 810 The locker doors "Inside" were marked and not polished properly.
4885 1313 The seal around / on top of the flush panel is damaged.
4932 825 The clock facing the bag drop drive way is not set correctly / displays incorrect time.
5067 744 Gaps are visible between the interlock flooring tiles.
5027 737 The menu is damaged.
5067 748 The wall is seen blistered.
4845 825 The left side of the panel is fused.
4952 810 The terrace tiles are damaged.
5496 1044 tetst
5022 732 The service door is left open and construction equipment is left unattended.
5496 1044 test
5496 2009 test
4952 810 The terrace tiles are cracked /damaged.
5058 1110 The
5067 2022 The umbrella's bases of the restaurant are seen dusty and dirty.
5058 1110 The Interlock flooring is seen damaged and stained.
5058 1110 Gaps are visible between Interlock flooring.
5058 1110 Several toilet cubicles doors are seen chipped.
5489 824 tttt
5058 1110 The prayer timings electrical board has been removed during painting and never returned back and a mark is visible on the wall.
4771 693 The toilet cubicle skirtings are scratched.
5026 52 The terrace is damaged.
5027 737 The menu is damaged.
5026 743 The terrace is damaged.
4906 24 fgfgf
5059 829 The wall around the A/C grill is stained.
5059 829 The door stopper is missing and tile is damaged by door handle.
5059 829 The soap holder is missing.
5059 829 The douche tap fitting is loose.
5059 829 The corner of the wall is damaged and moldy.
5059 829 The ping pong table is damaged.
5059 829 The sign at the gate to pool area is faded.
5059 829 The protective net is not properly installed. The fitting is untidy.
5059 829 The pool loungers are stained.
5059 829 The corner of the wall is damaged and moldy.
5059 829 The corner of the wall is damaged and moldy.
5059 829 The corner of the wall is damaged and moldy.
5058 1117 The empty unit is seen not hoarded; window is dirty and dust is visible from the window.
5058 1110 The flooring arrows are faded and worn.
5490 1957 test
5022 732 There appears to be water damage on the dipped ceiling.
4825 833 The
5022 727 The information about where the stairs lead to is missing.
5022 732 The stairs walls are all blank. Information about what is at the top of the stairs needs to be added to those walls.
5022 732 The yellow exit sign painted on the wall is damaged above it and the paint is uneven and untidy.
5022 732 The yellow car park sign hanging from the ceiling is chipped at the lower left ledge.
5056 833 Ceiling access panels are still found missing.
5056 833 Main door is damaged on lower edge.
5022 732 There is yellow tape in a square shape left above the Tche Tche Cafe sign on the wall.
5056 833 Tiles panels are damaged.
Current Output from the script is below
Code:
WORD1 working
WORD2 correctly
WORD3 cover
WORD4 ac
WORD5 doors
WORD6 it
WORD7 full
WORD8 display
WORD9 parking
WORD10 heavily
WORD11 wooden
WORD12 for
WORD13 edges
WORD14 humidity
WORD15 cubicles
WORD16 fitted
WORD17 out
WORD18 room
WORD19 tree
WORD20 behind
WORD21 fence
WORD22 ok
WORD23 dusty
WORD24 cabinet
WORD25 along
WORD26 rusty
WORD27 overgrown
WORD28 as
WORD29 signs
WORD30 protruding
WORD31 painted
WORD32 fountain
WORD33 covered
WORD34 does
WORD35 dry
WORD36 availability
WORD37 lift
WORD38 operational
WORD39 severally
WORD40 poor
WORD41 found
WORD42 litter
WORD43 blistered
I foresee problems with the approach of excluding common words. "Damaged" is an important word, but also common in your data. "Not" is also common and kind of vital. And when your data changes, so will whatever words you exclude.
And how important many words are, depends on context. Data is not lost from deleting "left" from "door left open", but it is lost from "left door open".
You can build lists of exclusions and special words until the cows come home, and then one funny case will come along which blows it all out of the water. Add one more special case for that word and special case special cases for any odd but valid ways that word might be used. Rinse and repeat until you lose your mind or your code gains sentience.
I'm not sure true English language processing can be implemented in a tinkertoy.
Deleting common words like "the" and "is", that's certainly doable.
Last edited by Corona688; 11-16-2018 at 12:19 PM..
Hi all,
I am trying to run below python code for connecting remote windows machine from unix to run an python file exist on that remote windows machine..
Below is the code I am trying:
#!/usr/bin/env python
import wmi
c = wmi.WMI("xxxxx", user="xxxx", password="xxxxxxx")... (1 Reply)
I have bash shell script which is internally calling python script.I would like to know how long python is taking to execute.I am not allowed to do changes in python script.Please note i need to know execution time of python script which is getting executed inside shell .I need to store execution... (2 Replies)
I just learning shell script. Need your shell script expertise to help me. I would like to stemming the words by matching the root words first between both files and replace all words by "I" character but replace "B" character after root words and "E" before root words in affix_words.txt.
... (18 Replies)
My script triggers and e-mail if keywords supplied to it were found.
Problem is if it find the same keyword continously (due to continous server errors), it triggers mails and fillup my mail box with same message (which is not required)
I want my script to NOT to send an e-mail if it finds the... (13 Replies)
Hi
I want to implement something like this:
if( keyword1 exists)
then
check if(keyword2 exists in the same line)
then replace keyword 2 with New_Keyword
else
Add New_Keyword at the end of line
end if
eg:
Check for Keyword JUNGLE and add/replace... (7 Replies)
experts, i wrote a python script to do a certain job, i tried it and it is working fine, i want this script to be executed automatically after a ksh script, the problem is when i execute the ksh script my python script runes perfectly after the ksh script as I have include it at the end of the ksh... (1 Reply)
Hello,
I need a shell script which takes search keyword as input and then searches logs in six different servers and provide me the logs where in it found the keyword.
Can anyone help???? (1 Reply)
Hi All
I have a function in a linux script like this
clean_up()
{
db2 -x "UPDATE ${DB_SCHEMA_NAME}.ETL_DAILY SET ETL_STATUS = 'SUCCESSFUL' WHERE PROCESS_DATE = '${INT_RUN_DATE}' AND BATCH_NO = ${CM_BATCH} AND APP_ID = ${APP_ID} AND APP_VERSION = '${APP_VERSION}'" > ${TMPOUT}
... (3 Replies)
I have a shell script main.sh which inturn call the python script ofdm.py, I want to pass two variables from shell script to python script for its execution. How do i achieve this ?????
Eg:
main.sh
a=3 b=3;
c= a+b
exec python ofdm.py
ofdm.py
d=c+a
Thanks in Anticipation (4 Replies)
Hi,
I have a cron process that runs daily and generates a log file. The process writes the date it ran and also any errors to the log file.
I need to write a script that will check if the process ran yesterday and also look for the keyword 'ERROR'. If it did not run yesterday or if it found... (0 Replies)