Unix/Linux Go Back    


Programming Post questions about C, C++, Java, SQL, and other programming languages here.

Appending a column in xlsx file using Python

Programming


Tags
append, excel, openpyxl, overwrite, python

Reply    
 
Thread Tools Search this Thread Display Modes
    #8  
Old Unix and Linux 06-23-2017   -   Original Discussion by nans
durden_tyler's Unix or Linux Image
durden_tyler durden_tyler is offline Forum Advisor  
Registered User
 
Join Date: Apr 2009
Last Activity: 9 September 2017, 1:30 PM EDT
Posts: 2,083
Thanks: 21
Thanked 383 Times in 346 Posts
Hmm... the indentation seems a bit awry.

Quote:
Originally Posted by nans View Post
...
...

Code:
import os
from openpyxl import load_workbook
from datetime import datetime
import csv
  
# Variables
sheet_directory = r'/home/test'
text_directory = r'/home/test'
  
# Subroutines
def get_text_data(txt_filename):
    dict_pos = {}
    first_line = True
    with open('scores.txt') as txt_filename:
        tab_reader = csv.reader(txt_filename, delimiter='\t')
        for line in tab_reader:
            if first_line:
                first_line = False
                continue
                line = line.rstrip('\n')  #Move the lines in red one level out
                x = line.split('\t')
                dict_pos[x[0]] = x[3]
                #print dict_pos          #Move the lines in blue three levels out
                return dict_pos


def process_xl_sheets():
    for sheet_root, sheet_dirs, sheet_files in os.walk(sheet_directory):
        for sheet_file in sheet_files:
            if sheet_file.endswith('.xlsx'):
                dpos = get_text_data(sheet_file.replace('.xlsx', '.txt'))
                sheet_xl_file = os.path.join(sheet_root, sheet_file)
                wb = load_workbook(sheet_xl_file, data_only=True)
                ws = wb.get_sheet_by_name('raw_data')
                pos_col_no = 'F'
                score_col_no = 'V'
                row_no = 4
                cell = ws[pos_col_no + str(row_no)]
                while cell.value:
                    if str(cell.value) in dpos:
                        ws[score_col_no + str(row_no)] = dpos[str(cell.value)]
                    else:
                        ws[score_col_no + str(row_no)] = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                        row_no += 1
                        cell = ws[pos_col_no + str(row_no)]
                        wb.save(sheet_xl_file)

                # Main section
process_xl_sheets()

Here's my code for reference. Check the indentation level and the comments:


Code:
# Subroutines
def get_text_data(txt_filename):
    dict_pos = {}
    first_line = True
    for text_root, text_dirs, text_files in os.walk(text_directory):
        for text_file in text_files:
            if text_file == txt_filename:
                # A matching text file was found
                fh = open(os.path.join(text_root, text_file))
                for line in fh:
                    # Skip the header; read the data into the dictionary
                    if first_line:                 # One level inside "for line" loop
                        first_line = False         # One level inside "for line" loop, one level inside "if first_line" branch
                        continue                   # One level inside "for line" loop, one level inside "if first_line" branch
                    line = line.rstrip('\n')       # One level inside "for line" loop
                    x = line.split('\t')           # One level inside "for line" loop
                    dict_pos[x[0]] = x[3]          # One level inside "for line" loop
    return dict_pos                                # One level inside "def"; this is at subroutine level

In your code, the value of "first_line" is True the first time it enters the subroutine and subsequently the first line of "tab_reader".
You then set it to False and then everything else is done inside that "if" branch.
From the second line of "tab_reader" onwards, the control will never go inside the "if" branch because it was set to False in the first line.

---------- Post updated at 12:13 PM ---------- Previous update was at 12:08 PM ----------

Also check the level of "wb.save()" in your code. It should be at the same level as "while cell.value".
The Following User Says Thank You to durden_tyler For This Useful Post:
nans (07-05-2017)
Sponsored Links
    #9  
Old Unix and Linux 06-24-2017   -   Original Discussion by nans
nans's Unix or Linux Image
nans nans is offline
Registered User
 
Join Date: Mar 2013
Last Activity: 30 October 2017, 6:30 AM EDT
Posts: 74
Thanks: 39
Thanked 0 Times in 0 Posts
It still doesn;t work for me. the code runs but doesn't give any output at all or print 'dict_pos'. I have attached the code I am using now.



Code:
#!/usr/bin/python

import sys
sys.path.append('/usr/local/lib/python2.7/site-packages')

import os
from openpyxl import load_workbook
from datetime import datetime
import csv

# Variables
sheet_directory = r'/home/test'
text_directory = r'/home/test'

# Subroutines
def get_text_data(txt_filename):
    dict_pos = {}
    first_line = True
    with open('scores.txt') as txt_filename:
        tab_reader = csv.reader(txt_filename, delimiter='\t')
        for line in tab_reader:
            if first_line:
                first_line = False
                continue
            line = line.rstrip('\n')
            x = line.split('\t')
            dict_pos[x[0]] = x[3]
        print dict_pos          
        return dict_pos


def process_xl_sheets():
    for sheet_root, sheet_dirs, sheet_files in os.walk(sheet_directory):
        for sheet_file in sheet_files:
            if sheet_file.endswith('.xlsx'):
                dpos = get_text_data(sheet_file.replace('.xlsx', '.txt'))
                sheet_xl_file = os.path.join(sheet_root, sheet_file)
                wb = load_workbook(sheet_xl_file, data_only=True)
                ws = wb.get_sheet_by_name('raw_data')
                pos_col_no = 'F'
                score_col_no = 'V'
                row_no = 4
                cell = ws[pos_col_no + str(row_no)]
                while cell.value:
                    if str(cell.value) in dpos:
                        ws[score_col_no + str(row_no)] = dpos[str(cell.value)]
                    else:
                        ws[score_col_no + str(row_no)] = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                        row_no += 1
                        cell = ws[pos_col_no + str(row_no)]
                wb.save(sheet_xl_file)

# Main section
process_xl_sheets()

Sponsored Links
    #10  
Old Unix and Linux 06-25-2017   -   Original Discussion by nans
durden_tyler's Unix or Linux Image
durden_tyler durden_tyler is offline Forum Advisor  
Registered User
 
Join Date: Apr 2009
Last Activity: 9 September 2017, 1:30 PM EDT
Posts: 2,083
Thanks: 21
Thanked 383 Times in 346 Posts
Quote:
Originally Posted by nans View Post
It still doesn;t work for me. the code runs but doesn't give any output at all or print 'dict_pos'. I have attached the code I am using now.
...
...
Does it throw any error messages?
    #11  
Old Unix and Linux 06-25-2017   -   Original Discussion by nans
nans's Unix or Linux Image
nans nans is offline
Registered User
 
Join Date: Mar 2013
Last Activity: 30 October 2017, 6:30 AM EDT
Posts: 74
Thanks: 39
Thanked 0 Times in 0 Posts
Quote:
Originally Posted by durden_tyler View Post
Does it throw any error messages?
No error msg.
Sponsored Links
    #12  
Old Unix and Linux 06-25-2017   -   Original Discussion by nans
durden_tyler's Unix or Linux Image
durden_tyler durden_tyler is offline Forum Advisor  
Registered User
 
Join Date: Apr 2009
Last Activity: 9 September 2017, 1:30 PM EDT
Posts: 2,083
Thanks: 21
Thanked 383 Times in 346 Posts
My hunch is that:
(a) either there are no ".xlsx" files in "/home/test" or
(b) there are ".xlsx" files in "/home/test" but the Python script does not have the privilege to write to them

Print the value of "sheet_file" right after the "if sheet_file.endswith()" condition, run the Python program from the command line and post the result here (select, copy, paste from your terminal window.)
Sponsored Links
    #13  
Old Unix and Linux 06-26-2017   -   Original Discussion by nans
nans's Unix or Linux Image
nans nans is offline
Registered User
 
Join Date: Mar 2013
Last Activity: 30 October 2017, 6:30 AM EDT
Posts: 74
Thanks: 39
Thanked 0 Times in 0 Posts
Following your suggestion, now I get these errors
All the files and codes are running in the same directory. I've also made sure the xlsx has permissions.

Code:
Traceback (most recent call last):
  File "/home/test/annotate.py", line 53, in <module>
    process_xl_sheets()
  File "/home/test/annotate.py", line 35, in process_xl_sheets
    dpos = get_text_data(sheet_file.replace('.xlsx', '.txt'))
  File "/home/test/annotate.py", line 23, in get_text_data
    line = line.rstrip('\n')
AttributeError: 'list' object has no attribute 'rstrip'

Sponsored Links
    #14  
Old Unix and Linux 06-26-2017   -   Original Discussion by nans
durden_tyler's Unix or Linux Image
durden_tyler durden_tyler is offline Forum Advisor  
Registered User
 
Join Date: Apr 2009
Last Activity: 9 September 2017, 1:30 PM EDT
Posts: 2,083
Thanks: 21
Thanked 383 Times in 346 Posts
Quote:
Originally Posted by nans View Post
Following your suggestion, now I get these errors
All the files and codes are running in the same directory. I've also made sure the xlsx has permissions.

Code:
Traceback (most recent call last):
  File "/home/test/annotate.py", line 53, in <module>
    process_xl_sheets()
  File "/home/test/annotate.py", line 35, in process_xl_sheets
    dpos = get_text_data(sheet_file.replace('.xlsx', '.txt'))
  File "/home/test/annotate.py", line 23, in get_text_data
    line = line.rstrip('\n')
AttributeError: 'list' object has no attribute 'rstrip'

You are getting the "list has no attribute rstrip" error because you are trying to use the "rstrip()" function on the list (or array) called "line".

The "rstrip('\n')" function removes the trailing newline ('\n') characters from a string. It cannot work on an array. (What are the trailing characters of an array?)
It is documented here: 7.1. string — Common string operations — Python 2.7.13 documentation

Notice that "line" in your code is a list (or array). But "line" in my code is a string. That's because you are reading your file using csv.reader which returns a reader object. And when you iterate through that reader object, each iterator variable is a list (or array).
It is documented here: 13.1. csv — CSV File Reading and Writing — Python 2.7.13 documentation

To give a concrete example, if a tab-separated file looks like this:


Code:
$
$ cat -n countries.txt
    1  Continent       Country Capital
    2  Europe  Germany Berlin
    3  North America   Canada  Ottawa
    4  Africa  Namibia Windhoek
    5  Asia    Japan   Tokyo
$
$

then my code does something like this (check the comments):


Code:
>>>
>>>
>>> fh = open('countries.txt')
>>> for line in fh:
...     print 'line is a string: ==>|', line, '|<=='   # 'line' is a string with a newline character at the end
...     line = line.rstrip('\n')                       # Strip the newline character at the end of the string 'line'
...     x = line.split('\t')                           # Now split the string 'line' on the Tab character ('\t') to obtain the list (or array) 'x'
...     print 'x is an array:    ==>|', x, '|<==\n\n'  # Print the list (or array) 'x'
...
line is a string: ==>| Continent        Country Capital
|<==
x is an array:    ==>| ['Continent', 'Country', 'Capital'] |<==
 
line is a string: ==>| Europe   Germany Berlin
|<==
x is an array:    ==>| ['Europe', 'Germany', 'Berlin'] |<==
 
line is a string: ==>| North America    Canada  Ottawa
|<==
x is an array:    ==>| ['North America', 'Canada', 'Ottawa'] |<==
 
line is a string: ==>| Africa   Namibia Windhoek
|<==
x is an array:    ==>| ['Africa', 'Namibia', 'Windhoek'] |<==
 
line is a string: ==>| Asia     Japan   Tokyo
|<==
x is an array:    ==>| ['Asia', 'Japan', 'Tokyo'] |<==
 
>>>
>>>
>>>

And your code does something like this (check the comments):


Code:
>>>
>>> import csv
>>> with open('countries.txt') as txt_filename:
...     tab_reader = csv.reader(txt_filename, delimiter='\t')    # tab_reader is a reader object
...     print 'tab_reader is:        ==>|', tab_reader, '|<=='
...     for line in tab_reader:
...         print 'line is an array: ==>|', line, '|<=='         # 'line' is a list (or array)
...
tab_reader is:        ==>| <_csv.reader object at 0x000000000250D8E8> |<==
line is an array: ==>| ['Continent', 'Country', 'Capital'] |<==
line is an array: ==>| ['Europe', 'Germany', 'Berlin'] |<==
line is an array: ==>| ['North America', 'Canada', 'Ottawa'] |<==
line is an array: ==>| ['Africa', 'Namibia', 'Windhoek'] |<==
line is an array: ==>| ['Asia', 'Japan', 'Tokyo'] |<==
>>>
>>>

So, the inference is that "x" in my code is the same as "line" in your code.
I hope you know how to proceed from here.
Try to form the dictionary "dict_pos" and post your attempt if you cannot make it to work.
The Following User Says Thank You to durden_tyler For This Useful Post:
nans (07-05-2017)
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Python soap and string to .xlsx conversion timj123 Shell Programming and Scripting 8 06-09-2017 05:09 PM
Appending = in particular column in csv file Divya1987 Shell Programming and Scripting 2 01-15-2013 09:50 AM
appending column file f_o_555 Shell Programming and Scripting 4 03-05-2009 04:09 AM
Appending 'string' to file as first column. satyam_sat Shell Programming and Scripting 6 02-20-2009 05:15 AM
Appending a column in one file to the corresponding line in a second suzannef Shell Programming and Scripting 3 01-12-2009 05:42 PM



All times are GMT -4. The time now is 08:44 PM.