Home Man
Search
Today's Posts
Register

Post questions about C, C++, Java, SQL, and other programming languages here.

Appending a column in xlsx file using Python

Tags
append, excel, openpyxl, overwrite, programming, python

Login to Reply

 
Thread Tools Search this Thread
# 15  
Old 06-26-2017
Thank you! That makes sense. So now when I print dict_pos, it seems to have formed a dictionary (not posting all of it as its large

Code:
/usr/bin/python2.7 /home/test/annotate.py
S12.xlsx
{'4300': '5', '3921': '1', '9072': '1', '16343': '1', '14007': '1', '13759': '1', '14911': '1', '14178': '1', '14179': '1', '16140': '1', '13359': '1', '4024': '1', '4025': '1'}

But when writing to the excel worksheet, instead of writing the score, all values end up as 'Unknown_June2017'

Also, is it possible to form a dictionary with multiple keys ? For example, I need the first three columns in the 'score.txt' to be associated with the score value and that needs to be compared with column 5,6,7 from the worksheet

Code:
dict_pos[x[0],[1],[2]] = x[3]


Last edited by nans; 06-26-2017 at 12:22 PM..
# 16  
Old 06-26-2017
Quote:
Originally Posted by nans
...But when writing to the excel worksheet, instead of writing the score, all values end up as 'Unknown_June2017'

Also, is it possible to form a dictionary with multiple keys ? For example, I need the first three columns in the 'score.txt' to be associated with the score value and that needs to be compared with column 5,6,7 from the worksheet

Code:
dict_pos[x[0],[1],[2]] = x[3]

My hunch is that you are looking at the wrong column.
If your "pos_col_no" is "F" and "row_no" is 4, then the code will look at cells F4, F5, F6, F7, F8, .... and check if they are keys of dictionary "dpos".

Since you see 'Unknown_June2017' in cells V4, V5, V6, V7, V8, ... it means that the keys are not in column F but in some other column.

Yes, it's possible to form a dictionary with multiple keys.
You can use a special Python data structure called a "tuple" for that.
Elements of tuples have parentheses around them e.g.
Code:
('a', 'b', 'c')

is a tuple.
The code can work without parentheses (for the most part) but it's better to specify them in order to avoid ambiguity.
Like so:

Code:
 >>>
>>>
>>> color_mix = {}
>>>
>>> color_mix['red', 'blue'] = 'purple'          # works without parentheses
>>>
>>> color_mix[('blue', 'yellow')] = 'green'      # although it's customary to use them
>>>
>>> color_mix['yellow', 'red'] = 'orange'
>>>
>>> for k in color_mix.keys():
...     print k
...
('blue', 'yellow')
('red', 'blue')
('yellow', 'red')
>>>
>>>

Comparing tuples is easy:

Code:
 >>>
>>> tuple1 = ('cat', 'dog')
>>> tuple2 = ('dog', 'rat')
>>> tuple3 = ('cat', 'dog')
>>>
>>> tuple1 == tuple2
False
>>>
>>> tuple1 == tuple3
True
>>>
>>> ('hog', 'eel') == tuple1    # Works like this too
False
>>>
>>>

But this is where parentheses are important:

Code:
 >>>
>>> 'hog', 'eel' == tuple3      # Nope! Not what you would expect!
('hog', False)
>>>
>>> ('hog', 'eel') == ('dog', 'rat')  # Use parentheses to avoid surprises
False
>>>
>>>

The Following User Says Thank You to durden_tyler For This Useful Post:
nans (07-05-2017)
# 17  
Old 06-27-2017
Thank you.
yes, i was using the wrong column! Now this is my final code

Code:
#!/usr/bin/python

import sys
import os
from openpyxl import load_workbook
from datetime import datetime
from pandas import read_table
import csv
from collections import namedtuple

# Variables
sheet_directory = r'/home/test'

# Subroutines
def get_text_data(txt_filename):
    dict_pos = {}
    Scores = namedtuple("Scores", ["POS", "ALT", "REF"])
    first_line = True
    with open('/home/test/scores.txt') as txt_filename:
        for line in txt_filename:
            if first_line:
                first_line = False
                continue
            line = line.rstrip('\n')
            x = line.split('\t')
            cpos = Scores(POS=x[0], ALT=x[2], REF=x[1])
            dict_pos[cpos] = x[3]
        print dict_pos          
        return dict_pos


def process_xl_sheets():
    for sheet_root, sheet_dirs, sheet_files in os.walk(sheet_directory):
        for sheet_file in sheet_files:
            if sheet_file.endswith('.xlsx'):
                print(sheet_file)
                dpos = get_text_data(sheet_file.replace('.xlsx', '.txt'))  ##what exactly is this part doing ? There is only one text file 'score.txt' to be referenced against several xlsx files named S12.xlsx , S13.xlsx etc
                sheet_xl_file = os.path.join(sheet_root, sheet_file)
                wb = load_workbook(sheet_xl_file, data_only=True)
                ws = wb.get_sheet_by_name('raw_data')
                pos_col_no = 'E'
                alt_col_no = 'G'
                ref_col_no = 'F'
                score_col_no = 'V'
                row_no = 4
                #compare = Scores(POS=pos_col_no, ALT=alt_col_no, REF=ref_col_no)
                #cell = ws[compare + str(row_no) ]
                cell = ws[pos_col_no + alt_col_no + ref_col_no + str(row_no)]
                print cell.value           ##doesn't print
                while cell.value:
                    if str(cell.value) in dpos:
                        ws[score_col_no + str(row_no)] = dpos[str(cell.value)]
                    else:
                        ws[score_col_no + str(row_no)] = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                        row_no += 1
                        cell = ws[pos_col_no + str(row_no)]
                wb.save(sheet_xl_file)

# Main section
process_xl_sheets()

when I run it now, it doesnt print the cell.value. So when I'm attempting to do is making the code to compare the threee columns in the excel file to the three columns in the text file so that it can output its corresponding Score

Code:
/usr/bin/python2.7 /home/test/annotate.py
S12.xlsx
{Scores(POS='73', ALT='C', REF='CN'): 'A', Scores(POS='497', ALT='C', REF='T'): '1', Scores(POS='2196', ALT='T', REF='C'): '1', Scores(POS='2080', ALT='C', REF='A'): '1', Scores(POS='2456', ALT='C', REF='T'): '1'}
None


Last edited by nans; 06-27-2017 at 08:35 AM.. Reason: updated code
# 18  
Old 06-27-2017
I haven't gone through the entire code yet, and it might be tricky to test your code since I don't have pandas, but let me answer your question about the function call.

Quote:
Originally Posted by nans
...
Code:
#!/usr/bin/python
...
...
 # Subroutines
def get_text_data(txt_filename):
    dict_pos = {}
    Scores = namedtuple("Scores", ["POS", "ALT", "REF"])
    first_line = True
    with open('/home/test/scores.txt') as txt_filename:
        for line in txt_filename:
            if first_line:
                first_line = False
                continue
            line = line.rstrip('\n')
            x = line.split('\t')
            cpos = Scores(POS=x[0], ALT=x[2], REF=x[1])
            dict_pos[cpos] = x[3]
        print dict_pos          
        return dict_pos


def process_xl_sheets():
    for sheet_root, sheet_dirs, sheet_files in os.walk(sheet_directory):
        for sheet_file in sheet_files:
            if sheet_file.endswith('.xlsx'):
                print(sheet_file)
                dpos = get_text_data(sheet_file.replace('.xlsx', '.txt'))  ##what exactly is this part doing ? There is only one text file 'score.txt' to be referenced against several xlsx files named S12.xlsx , S13.xlsx etc
                sheet_xl_file = os.path.join(sheet_root, sheet_file)
...
 ...
# Main section
process_xl_sheets()

...
...
The replace() function replaces the value '.xlsx' to '.txt' in the string variable sheet_file.
The string variable sheet_file holds the name of your Excel file.
So, let's say while looping through the sheet_directory, your Python program finds an Excel file called "S12.xlsx". Then sheet_file will equal "S12.xlsx".

Thereafter, this expression:
Code:
sheet_file.replace('.xlsx', '.txt')

replaces '.xlsx' to '.txt' and thereby returns 'S12.txt'.

And then this value 'S12.txt' is passed to the function get_text_data().
That is, the value of the string parameter txt_filename is 'S12.txt'.

You can see this very quickly by printing txt_filename the moment you enter the function.

In the "with" statement inside the function "get_text_data", however, you use the same name txt_filename. That converts the string parameter txt_filename to a file object.

Thereafter, till the end of the function "get_text_data", txt_filename remains a file object.
So essentially, you are not using the txt_filename parameter in your function at all.

My suggestion: don't pass a parameter to a function if you are not using it at all. You anyway have the text file name ("scores.txt") and location hard-coded.
If something is not needed, discard it. Keep it simple.
The Following User Says Thank You to durden_tyler For This Useful Post:
nans (07-05-2017)
# 19  
Old 06-27-2017
Ah no worries.

If I remove that, then what would be the best way to proceed further

Code:
#dpos = get_text_data(sheet_file.replace('.xlsx', '.txt'))
                sheet_xl_file = os.path.join(sheet_root, sheet_file)
                wb = load_workbook(sheet_xl_file, data_only=True)
                ws = wb.get_sheet_by_name('Unannotated')
                pos_col_no = 'E'
                alt_col_no = 'G'
                ref_col_no = 'F'
                score_col_no = 'V'
                row_no = 4
                #compare = Scores(POS=pos_col_no, ALT=alt_col_no, REF=ref_col_no)
                #cell = ws[compare + str(row_no) ]
                cell = ws[pos_col_no + alt_col_no + ref_col_no + str(row_no)]
                print cell.value
                while cell.value:
                    if str(cell.value) in dpos:
                        ws[score_col_no + str(row_no)] = dpos[str(cell.value)]
                    else:
                        ws[score_col_no + str(row_no)] = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                        row_no += 1
                        cell = ws[pos_col_no + str(row_no)]

# 20  
Old 06-27-2017
Quote:
Originally Posted by nans
...
If I remove that, then what would be the best way to proceed further
...
Code:
#dpos = get_text_data(sheet_file.replace('.xlsx', '.txt'))
                sheet_xl_file = os.path.join(sheet_root, sheet_file)
                wb = load_workbook(sheet_xl_file, data_only=True)
                ws = wb.get_sheet_by_name('Unannotated')
                pos_col_no = 'E'
                alt_col_no = 'G'
                ref_col_no = 'F'
                score_col_no = 'V'
                row_no = 4
                #compare = Scores(POS=pos_col_no, ALT=alt_col_no, REF=ref_col_no)
                #cell = ws[compare + str(row_no) ]
                cell = ws[pos_col_no + alt_col_no + ref_col_no + str(row_no)]
            print cell.value
                while cell.value:
                    if str(cell.value) in dpos:
                        ws[score_col_no + str(row_no)] = dpos[str(cell.value)]
                    else:
                        ws[score_col_no + str(row_no)] = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                        row_no += 1
                        cell = ws[pos_col_no + str(row_no)]

1) Remove the parameter, not the function call. In other words, do call the function, but don't pass any parameter to it.

2) Change the signature of the function so that it does not accept any parameter.

3) Call the function only once. Calling the same function that reads the same text file and returns the same dictionary every time you find an Excel spreadsheet is extremely inefficient.

4) Check what you are trying to print.
Code:
pos_col_no = 'E'
alt_col_no = 'G'
ref_col_no = 'F'
row_no = 4

So,
Code:
pos_col_no + alt_col_no + ref_col_no + str(row_no) = 'EGF4'

There is no cell called "EGF4" in any Excel spreadsheet.
Hence Python cannot print it.
>>> Ok, that's a wrong statement. My bad, sorry. Looks like newer versions of Microsoft Excel do have cell "EGF4".
>>> That may not be the cell you want to print. My guess is that your program is printing "None" instead of nothing.
>>> Your spreadsheet's cell "EGF4" is empty, most likely.

Last edited by durden_tyler; 06-27-2017 at 05:15 PM..
The Following User Says Thank You to durden_tyler For This Useful Post:
nans (07-05-2017)
# 21  
Old 06-27-2017
Quote:
Originally Posted by durden_tyler
1) Remove the parameter, not the function call. In other words, do call the function, but don't pass any parameter to it.

2) Change the signature of the function so that it does not accept any parameter.

3) Call the function only once. Calling the same function that reads the same text file and returns the same dictionary every time you find an Excel spreadsheet is extremely inefficient.

4) Check what you are trying to print.
Code:
pos_col_no = 'E'
alt_col_no = 'G'
ref_col_no = 'F'
row_no = 4

So,
Code:
pos_col_no + alt_col_no + ref_col_no + str(row_no) = 'EGF4'

There is no cell called "EGF4" in any Excel spreadsheet.
Hence Python cannot print it.
>>> Ok, that's a wrong statement. My bad, sorry. Looks like newer versions of Microsoft Excel do have cell "EGF4".
>>> That may not be the cell you want to print. My guess is that your program is printing "None" instead of nothing.
>>> Your spreadsheet's cell "EGF4" is empty, most likely.
I'm using Libre Office, an older version, not microsoft excel. So what would be the appropriate way of calling E4:G4:F4 -> V4

What do you mean by point 2 ? Will.it then be
dpos = get_text_data(txt_filename)

Last edited by nans; 06-27-2017 at 06:40 PM..
Login to Reply

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Python soap and string to .xlsx conversion timj123 Shell Programming and Scripting 8 06-09-2017 04:09 PM
Awk, appending a number in the first column of a row with a condition hayreter Shell Programming and Scripting 2 11-05-2014 11:22 AM
FTP XLSX file from UNIX to windows Rathnakumar Shell Programming and Scripting 4 06-03-2014 11:32 AM
Appending = in particular column in csv file Divya1987 Shell Programming and Scripting 2 01-15-2013 08:50 AM
Appending column to rows unme Shell Programming and Scripting 7 07-25-2012 03:25 AM
Appending a column of numbers in ascending order to a text file evelibertine UNIX for Dummies Questions & Answers 11 03-13-2012 02:32 PM
Appending date value mmdd to first column in file kalyansid UNIX for Dummies Questions & Answers 1 11-11-2010 10:09 AM
appending column file f_o_555 Shell Programming and Scripting 4 03-05-2009 03:09 AM
Appending 'string' to file as first column. satyam_sat Shell Programming and Scripting 6 02-20-2009 04:15 AM
Appending a column in one file to the corresponding line in a second suzannef Shell Programming and Scripting 3 01-12-2009 04:42 PM


All times are GMT -4. The time now is 05:26 PM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
UNIX.COM Login
Username:
Password:  
Show Password