Appending a column in xlsx file using Python


 
Thread Tools Search this Thread
Top Forums Programming Appending a column in xlsx file using Python
# 15  
Old 06-26-2017
Thank you! That makes sense. So now when I print dict_pos, it seems to have formed a dictionary (not posting all of it as its large

Code:
/usr/bin/python2.7 /home/test/annotate.py
S12.xlsx
{'4300': '5', '3921': '1', '9072': '1', '16343': '1', '14007': '1', '13759': '1', '14911': '1', '14178': '1', '14179': '1', '16140': '1', '13359': '1', '4024': '1', '4025': '1'}

But when writing to the excel worksheet, instead of writing the score, all values end up as 'Unknown_June2017'

Also, is it possible to form a dictionary with multiple keys ? For example, I need the first three columns in the 'score.txt' to be associated with the score value and that needs to be compared with column 5,6,7 from the worksheet

Code:
dict_pos[x[0],[1],[2]] = x[3]


Last edited by nans; 06-26-2017 at 01:22 PM..
# 16  
Old 06-26-2017
Quote:
Originally Posted by nans
...But when writing to the excel worksheet, instead of writing the score, all values end up as 'Unknown_June2017'

Also, is it possible to form a dictionary with multiple keys ? For example, I need the first three columns in the 'score.txt' to be associated with the score value and that needs to be compared with column 5,6,7 from the worksheet

Code:
dict_pos[x[0],[1],[2]] = x[3]

My hunch is that you are looking at the wrong column.
If your "pos_col_no" is "F" and "row_no" is 4, then the code will look at cells F4, F5, F6, F7, F8, .... and check if they are keys of dictionary "dpos".

Since you see 'Unknown_June2017' in cells V4, V5, V6, V7, V8, ... it means that the keys are not in column F but in some other column.

Yes, it's possible to form a dictionary with multiple keys.
You can use a special Python data structure called a "tuple" for that.
Elements of tuples have parentheses around them e.g.
Code:
('a', 'b', 'c')

is a tuple.
The code can work without parentheses (for the most part) but it's better to specify them in order to avoid ambiguity.
Like so:

Code:
 >>>
>>>
>>> color_mix = {}
>>>
>>> color_mix['red', 'blue'] = 'purple'          # works without parentheses
>>>
>>> color_mix[('blue', 'yellow')] = 'green'      # although it's customary to use them
>>>
>>> color_mix['yellow', 'red'] = 'orange'
>>>
>>> for k in color_mix.keys():
...     print k
...
('blue', 'yellow')
('red', 'blue')
('yellow', 'red')
>>>
>>>

Comparing tuples is easy:

Code:
 >>>
>>> tuple1 = ('cat', 'dog')
>>> tuple2 = ('dog', 'rat')
>>> tuple3 = ('cat', 'dog')
>>>
>>> tuple1 == tuple2
False
>>>
>>> tuple1 == tuple3
True
>>>
>>> ('hog', 'eel') == tuple1    # Works like this too
False
>>>
>>>

But this is where parentheses are important:

Code:
 >>>
>>> 'hog', 'eel' == tuple3      # Nope! Not what you would expect!
('hog', False)
>>>
>>> ('hog', 'eel') == ('dog', 'rat')  # Use parentheses to avoid surprises
False
>>>
>>>

This User Gave Thanks to durden_tyler For This Post:
# 17  
Old 06-27-2017
Thank you.
yes, i was using the wrong column! Now this is my final code

Code:
#!/usr/bin/python

import sys
import os
from openpyxl import load_workbook
from datetime import datetime
from pandas import read_table
import csv
from collections import namedtuple

# Variables
sheet_directory = r'/home/test'

# Subroutines
def get_text_data(txt_filename):
    dict_pos = {}
    Scores = namedtuple("Scores", ["POS", "ALT", "REF"])
    first_line = True
    with open('/home/test/scores.txt') as txt_filename:
        for line in txt_filename:
            if first_line:
                first_line = False
                continue
            line = line.rstrip('\n')
            x = line.split('\t')
            cpos = Scores(POS=x[0], ALT=x[2], REF=x[1])
            dict_pos[cpos] = x[3]
        print dict_pos          
        return dict_pos


def process_xl_sheets():
    for sheet_root, sheet_dirs, sheet_files in os.walk(sheet_directory):
        for sheet_file in sheet_files:
            if sheet_file.endswith('.xlsx'):
                print(sheet_file)
                dpos = get_text_data(sheet_file.replace('.xlsx', '.txt'))  ##what exactly is this part doing ? There is only one text file 'score.txt' to be referenced against several xlsx files named S12.xlsx , S13.xlsx etc
                sheet_xl_file = os.path.join(sheet_root, sheet_file)
                wb = load_workbook(sheet_xl_file, data_only=True)
                ws = wb.get_sheet_by_name('raw_data')
                pos_col_no = 'E'
                alt_col_no = 'G'
                ref_col_no = 'F'
                score_col_no = 'V'
                row_no = 4
                #compare = Scores(POS=pos_col_no, ALT=alt_col_no, REF=ref_col_no)
                #cell = ws[compare + str(row_no) ]
                cell = ws[pos_col_no + alt_col_no + ref_col_no + str(row_no)]
                print cell.value           ##doesn't print
                while cell.value:
                    if str(cell.value) in dpos:
                        ws[score_col_no + str(row_no)] = dpos[str(cell.value)]
                    else:
                        ws[score_col_no + str(row_no)] = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                        row_no += 1
                        cell = ws[pos_col_no + str(row_no)]
                wb.save(sheet_xl_file)

# Main section
process_xl_sheets()

when I run it now, it doesnt print the cell.value. So when I'm attempting to do is making the code to compare the threee columns in the excel file to the three columns in the text file so that it can output its corresponding Score

Code:
/usr/bin/python2.7 /home/test/annotate.py
S12.xlsx
{Scores(POS='73', ALT='C', REF='CN'): 'A', Scores(POS='497', ALT='C', REF='T'): '1', Scores(POS='2196', ALT='T', REF='C'): '1', Scores(POS='2080', ALT='C', REF='A'): '1', Scores(POS='2456', ALT='C', REF='T'): '1'}
None


Last edited by nans; 06-27-2017 at 09:35 AM.. Reason: updated code
# 18  
Old 06-27-2017
I haven't gone through the entire code yet, and it might be tricky to test your code since I don't have pandas, but let me answer your question about the function call.

Quote:
Originally Posted by nans
...
Code:
#!/usr/bin/python
...
...
 # Subroutines
def get_text_data(txt_filename):
    dict_pos = {}
    Scores = namedtuple("Scores", ["POS", "ALT", "REF"])
    first_line = True
    with open('/home/test/scores.txt') as txt_filename:
        for line in txt_filename:
            if first_line:
                first_line = False
                continue
            line = line.rstrip('\n')
            x = line.split('\t')
            cpos = Scores(POS=x[0], ALT=x[2], REF=x[1])
            dict_pos[cpos] = x[3]
        print dict_pos          
        return dict_pos


def process_xl_sheets():
    for sheet_root, sheet_dirs, sheet_files in os.walk(sheet_directory):
        for sheet_file in sheet_files:
            if sheet_file.endswith('.xlsx'):
                print(sheet_file)
                dpos = get_text_data(sheet_file.replace('.xlsx', '.txt'))  ##what exactly is this part doing ? There is only one text file 'score.txt' to be referenced against several xlsx files named S12.xlsx , S13.xlsx etc
                sheet_xl_file = os.path.join(sheet_root, sheet_file)
...
 ...
# Main section
process_xl_sheets()

...
...
The replace() function replaces the value '.xlsx' to '.txt' in the string variable sheet_file.
The string variable sheet_file holds the name of your Excel file.
So, let's say while looping through the sheet_directory, your Python program finds an Excel file called "S12.xlsx". Then sheet_file will equal "S12.xlsx".

Thereafter, this expression:
Code:
sheet_file.replace('.xlsx', '.txt')

replaces '.xlsx' to '.txt' and thereby returns 'S12.txt'.

And then this value 'S12.txt' is passed to the function get_text_data().
That is, the value of the string parameter txt_filename is 'S12.txt'.

You can see this very quickly by printing txt_filename the moment you enter the function.

In the "with" statement inside the function "get_text_data", however, you use the same name txt_filename. That converts the string parameter txt_filename to a file object.

Thereafter, till the end of the function "get_text_data", txt_filename remains a file object.
So essentially, you are not using the txt_filename parameter in your function at all.

My suggestion: don't pass a parameter to a function if you are not using it at all. You anyway have the text file name ("scores.txt") and location hard-coded.
If something is not needed, discard it. Keep it simple.
This User Gave Thanks to durden_tyler For This Post:
# 19  
Old 06-27-2017
Ah no worries.

If I remove that, then what would be the best way to proceed further

Code:
#dpos = get_text_data(sheet_file.replace('.xlsx', '.txt'))
                sheet_xl_file = os.path.join(sheet_root, sheet_file)
                wb = load_workbook(sheet_xl_file, data_only=True)
                ws = wb.get_sheet_by_name('Unannotated')
                pos_col_no = 'E'
                alt_col_no = 'G'
                ref_col_no = 'F'
                score_col_no = 'V'
                row_no = 4
                #compare = Scores(POS=pos_col_no, ALT=alt_col_no, REF=ref_col_no)
                #cell = ws[compare + str(row_no) ]
                cell = ws[pos_col_no + alt_col_no + ref_col_no + str(row_no)]
                print cell.value
                while cell.value:
                    if str(cell.value) in dpos:
                        ws[score_col_no + str(row_no)] = dpos[str(cell.value)]
                    else:
                        ws[score_col_no + str(row_no)] = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                        row_no += 1
                        cell = ws[pos_col_no + str(row_no)]

# 20  
Old 06-27-2017
Quote:
Originally Posted by nans
...
If I remove that, then what would be the best way to proceed further
...
Code:
#dpos = get_text_data(sheet_file.replace('.xlsx', '.txt'))
                sheet_xl_file = os.path.join(sheet_root, sheet_file)
                wb = load_workbook(sheet_xl_file, data_only=True)
                ws = wb.get_sheet_by_name('Unannotated')
                pos_col_no = 'E'
                alt_col_no = 'G'
                ref_col_no = 'F'
                score_col_no = 'V'
                row_no = 4
                #compare = Scores(POS=pos_col_no, ALT=alt_col_no, REF=ref_col_no)
                #cell = ws[compare + str(row_no) ]
                cell = ws[pos_col_no + alt_col_no + ref_col_no + str(row_no)]
            print cell.value
                while cell.value:
                    if str(cell.value) in dpos:
                        ws[score_col_no + str(row_no)] = dpos[str(cell.value)]
                    else:
                        ws[score_col_no + str(row_no)] = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                        row_no += 1
                        cell = ws[pos_col_no + str(row_no)]

1) Remove the parameter, not the function call. In other words, do call the function, but don't pass any parameter to it.

2) Change the signature of the function so that it does not accept any parameter.

3) Call the function only once. Calling the same function that reads the same text file and returns the same dictionary every time you find an Excel spreadsheet is extremely inefficient.

4) Check what you are trying to print.
Code:
pos_col_no = 'E'
alt_col_no = 'G'
ref_col_no = 'F'
row_no = 4

So,
Code:
pos_col_no + alt_col_no + ref_col_no + str(row_no) = 'EGF4'

There is no cell called "EGF4" in any Excel spreadsheet.
Hence Python cannot print it.
>>> Ok, that's a wrong statement. My bad, sorry. Looks like newer versions of Microsoft Excel do have cell "EGF4".
>>> That may not be the cell you want to print. My guess is that your program is printing "None" instead of nothing.
>>> Your spreadsheet's cell "EGF4" is empty, most likely.

Last edited by durden_tyler; 06-27-2017 at 06:15 PM..
This User Gave Thanks to durden_tyler For This Post:
# 21  
Old 06-27-2017
Quote:
Originally Posted by durden_tyler
1) Remove the parameter, not the function call. In other words, do call the function, but don't pass any parameter to it.

2) Change the signature of the function so that it does not accept any parameter.

3) Call the function only once. Calling the same function that reads the same text file and returns the same dictionary every time you find an Excel spreadsheet is extremely inefficient.

4) Check what you are trying to print.
Code:
pos_col_no = 'E'
alt_col_no = 'G'
ref_col_no = 'F'
row_no = 4

So,
Code:
pos_col_no + alt_col_no + ref_col_no + str(row_no) = 'EGF4'

There is no cell called "EGF4" in any Excel spreadsheet.
Hence Python cannot print it.
>>> Ok, that's a wrong statement. My bad, sorry. Looks like newer versions of Microsoft Excel do have cell "EGF4".
>>> That may not be the cell you want to print. My guess is that your program is printing "None" instead of nothing.
>>> Your spreadsheet's cell "EGF4" is empty, most likely.
I'm using Libre Office, an older version, not microsoft excel. So what would be the appropriate way of calling E4:G4:F4 -> V4

What do you mean by point 2 ? Will.it then be
dpos = get_text_data(txt_filename)

Last edited by nans; 06-27-2017 at 07:40 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to insert data into black column( Secound Column ) in excel (.XLSX) file using shell script?

Source Code of the original script is down below please run the script and try to solve this problem this is my data and I want it column wise 2019-03-20 13:00:00:000 2019-03-20 15:00:00:000 1 Operating System LAB 0 1 1 1 1 1 1 1 1 1 0 1 (5 Replies)
Discussion started by: Shubham1182
5 Replies

2. Shell Programming and Scripting

Python soap and string to .xlsx conversion

Hi experts - I'm relatively new to python, but I have an requirement to automate getting a file from a WebLib server using an API. The file I'm requesting from this sever is an excel spreadsheet (.xlsx). I get a valid response back via an xml doc from the server. In this xml file I get... (8 Replies)
Discussion started by: timj123
8 Replies

3. Shell Programming and Scripting

Appending = in particular column in csv file

Hi, I have a requirement to append = in particular row in csv file. Data in csv is as follow: row1,a,a,a row2,b,b,b row3,c,c,c row4,d,d,d csv should be modified at row3 and no. of columns are not fixed but rows are. output should be as: row1,a,a,a row2,b,b,b row3,=c,=c,=c... (2 Replies)
Discussion started by: Divya1987
2 Replies

4. Shell Programming and Scripting

Appending column to rows

Hi All, Input.txt KGO Id "003" .......... .......... Par "CPara" BIN RECGET Name "DIR_PATH" Prompt "DIR_PATH" END RECGET KGO ............ .......... ............... KGO Id "077" .......... .......... (7 Replies)
Discussion started by: unme
7 Replies

5. UNIX for Dummies Questions & Answers

Appending a column of numbers in ascending order to a text file

I have a text file where I want to append a column of numbers in ascending orders. Input: 57 abc 25 def 32 ghi 54 jkl Output:57 abc 57 abc 1 25 def 2 32 ghi 3 54 jkl 4 How do I go about doing that? Thanks! (11 Replies)
Discussion started by: evelibertine
11 Replies

6. Shell Programming and Scripting

Appending new column to existing files

Hi, i want to add another column to existing files containing strings and need to have the final output as a csv file. i have quite a number of files, each with varying number of rows and i need to append the string "test" for all the valid rows for each file. my sample raw files looks like this... (8 Replies)
Discussion started by: ida1215
8 Replies

7. UNIX for Dummies Questions & Answers

Appending date value mmdd to first column in file

Hi , I have a file with a running sequence number. I need to append a date value mmdd format on to the first column. for e.g.: The file contains records as 001 abc 002 cde 003 edf 004 fgh 005 hik The output should be 1111001 abc 1111002 cde 1111003 edf 1111004 ... (1 Reply)
Discussion started by: kalyansid
1 Replies

8. Shell Programming and Scripting

appending column file

Hi all, I have two files with the same number of lines the first file is a.dat and looks like 0.000 1.000 1.000 2.000 ... the fields are tab separated the second file is b.dat and looks like 1.2347 0.546 2.3564 0.321 ... the fields are tab separated I would like to have a file c.dat... (4 Replies)
Discussion started by: f_o_555
4 Replies

9. Shell Programming and Scripting

Appending 'string' to file as first column.

Hi , I have the below file with 6 columns.I want to append 'File1' as the 1 column to the file. i have the sample code .It is not working . can u please correct this or make new one ..... awk 'print {'File1',$1,$2,$3,$4,$5,$6}' Source_File> Result_File Source_File:... (6 Replies)
Discussion started by: satyam_sat
6 Replies

10. Shell Programming and Scripting

Appending a column in one file to the corresponding line in a second

It appears that this has been asked and answered in similar fashions previously, but I am still unsure how to approach this. I have two files containing user information: fileA ttim:/home/ttim:Tiny Tim:632 ppinto:/home/ppinto:Pam Pinto:633 fileB ttim:xkfgjkd*&#^jhdfh... (3 Replies)
Discussion started by: suzannef
3 Replies
Login or Register to Ask a Question