👤
Home Man
Search
Today's Posts
Register

Post questions about C, C++, Java, SQL, and other programming languages here.

Appending a column in xlsx file using Python

Tags
append, excel, openpyxl, overwrite, python

👤 Login to reply

 
Thread Tools Search this Thread
# 36  
Old 07-03-2017
Quote:
Originally Posted by nans
Correct, those values are of (E4, G4, F4). But with "row_no += 1", shouldnt it pick up the values from the next row ?
...
...
No, it should not.
It will not.
No programming language will do that for you.
Golden rule of programming => a programming language will only do what you tell it to do. It will not do anything on its own.

Quote:
Originally Posted by nans
...
I thought this while loop meant 'go to row 4, if the row matches the score from the dictionary, print the value or else print unknown and then go on to the next row and do the same'
...
The "while" loop is only for repeating a "set of actions" while a "condition" is true.

The repetitive "set of actions" is:
1) going to next row
2) determining pos value, alt value, ref value for the row we reached
3) constructing the cpos value from the 3 values determined in the step above
4) checking if this cpos value is in the dictionary dict_pos
5) setting the value of "score" cell in the row that we are now

Since 1) through 5) are repetitive, we work on the entire spreadsheet, but one row at a time.
Inside a "while" loop, we are only working on one row - the current row.

The "condition" is: either pos value is non-empty or alt value is non-empty or ref value is non-empty.

As you can see, the structure of a "while" loop is very generic (repeat a set of actions while a condition is true), so it can be used in a wide variety of situations in any programming language.

Quote:
Originally Posted by nans
...
...
How would I recalculate the values in a loop for all rows ?

Code:
                while row_no <=10:
                    if (dict_pos.has_key(cpos)):
                        NG = dict_pos[cpos]
                        #print dict_pos[cpos]
                    else:
                        NG = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                    row_no += 1
                    print row_no
                    cpos = Scores(POS=cell_pos.value, ALT=cell_alt.value, REF=cell_ref.value)
                    print cpos
                wb.save(sheet_xl_file)

...
...
Old proverb: "How to eat an elephant? One bite at a time."

As I mentioned earlier, inside a loop, you only have to calculate the pos, alt and ref values for one row - the current row, the row you are on.

Now, for row_no = 4, you had a few statements that calculated cell_pos, cell_alt and cell_ref. Then you calculated "cpos" from those cell_pos, cell_alt and cell_ref values.

You need those statements inside the "while" loop.
So once the row_no is incremented inside the loop, you:
1) determine the cell_pos, cell_alt and cell_ref
2) next you calculate the cpos using cell_pos, cell_alt and cell_ref
This will be your "cpos" - only for the current row.
When printed, you will see different values of cpos - printed one at a time from within your "while" loop.

(The output that I showed in my earlier post was not printed in one shot - it was printed during each iteration of the loop - one line printed per iteration.)

Last edited by durden_tyler; 07-03-2017 at 12:05 PM..
The Following User Says Thank You to durden_tyler For This Useful Post:
nans (07-05-2017)
# 37  
Old 07-04-2017
okay, I'm getting there now slowly! But why isn't it saving the results to the excel sheet ?

Code:
     for sheet_root, sheet_dirs, sheet_files in os.walk(sheet_directory):
        for sheet_file in sheet_files:
            if sheet_file.endswith('.xlsx'):
                #print(sheet_file)
                sheet_xl_file = os.path.join(sheet_root, sheet_file)
                wb = load_workbook(sheet_xl_file, data_only=True)
                ws = wb.get_sheet_by_name('Unannotated')
                pos_col_no = 'E'
                alt_col_no = 'G'
                ref_col_no = 'F'
                score_col_no = 'V'
                row_no = 4
                NG = ws[score_col_no + str(row_no)]
                #print row_no
                cell_pos = ws[pos_col_no + str(row_no)]
                cell_ref = ws[ref_col_no + str(row_no)]
                cell_alt = ws[alt_col_no + str(row_no)]
                cpos = Scores(POS=cell_pos.value, ALT=cell_alt.value, REF=cell_ref.value)
                print cpos  ##prints only row E4:G4:F4
                #while cpos:
                while row_no <=10:
                    if (dict_pos.has_key(cpos)):
                        NG = dict_pos[cpos]
                        print dict_pos[cpos]
                    else:
                        NG = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                    row_no += 1
                    #print row_no
                    cell_pos = ws[pos_col_no + str(row_no)]
                    cell_ref = ws[ref_col_no + str(row_no)]
                    cell_alt = ws[alt_col_no + str(row_no)]
                    cpos = Scores(POS=cell_pos.value, ALT=cell_alt.value, REF=cell_ref.value)
                    print cpos
                    if (dict_pos.has_key(cpos)):
                        NG = dict_pos[cpos]
                    else:
                        NG = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                        print NG
                wb.save(sheet_xl_file)
process_xl_sheets()

Code:
/usr/bin/python2.7 annotate_new_dict.py
Scores(POS=u'73', ALT=u'G', REF=u'A')
11
Scores(POS=u'114', ALT=u'T', REF=u'C')
1
Scores(POS=u'263', ALT=u'G', REF=u'A')
Unknown_July2017
Scores(POS=u'309', ALT=u'T', REF=u'C')
Unknown_July2017
Scores(POS=u'497', ALT=u'T', REF=u'C')
Unknown_July2017
Scores(POS=u'513', ALT=u'G', REF=u'GCA')
Unknown_July2017
Scores(POS=u'750', ALT=u'G', REF=u'A')
1
Scores(POS=u'1189', ALT=u'C', REF=u'T')

Process finished with exit code 0

# 38  
Old 07-04-2017
Quote:
Originally Posted by nans
okay, I'm getting there now slowly! But why isn't it saving the results to the excel sheet ?
...
Yes, much better.
Now you are able to capture the values of E,G,F columns of every row you are going through.

As for saving the results, here are a few things you need to know about "openpyxl" module.
It has classes like "Workbook", "Worksheet", "Cell" etc. in it.

If "ws" is a "worksheet" object, then:
Code:
ws['V4']

is the cell object 'V4' in the worksheet.

The cell object has many attributes like "value", "fill", "comment" etc.

If you want to make changes to a cell, you set its attributes.
So, to set the cell's value, you set its "value" attribute. For example:
Code:
ws['V4'].value = 'Hello, World!'

will set the value of the cell 'V4' to 'Hello, World!'

Another example; the following:
Code:
ws['V4'].fill = PatternFill(bgColor="FF0000", fill_type = "solid")

will fill the red color in the cell 'V4'.

Now in your code, the following statement:
Code:
NG = ws[score_col_no + str(row_no)]

assigns the value of the cell object 'V4' to variable NG.
Edit: Sorry, this should read: "assigns the cell object 'V4' to variable NG."

However, this line:
Code:
NG = dict_pos[cpos]

sets the cell, not the cell's value.

Another thing => you need to recalculate "NG" inside the loop every time "row_no" increases.
Also => remove the second "if .. else" statement inside the "while" loop.

Last edited by durden_tyler; 07-04-2017 at 06:37 PM.. Reason: Incorrect statement about assignment. Edited statement is in red.
The Following User Says Thank You to durden_tyler For This Useful Post:
nans (07-05-2017)
# 39  
Old 07-04-2017
Got it! So I have changed it to
Code:
if (dict_pos.has_key(cpos)):
NG.value = dict_pos[cpos]
else:
NG.value = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
    print NG

One last question, right now the code is reading rows which don't have values and ends up in a loop. For now I have the rows limited to 'while row_no <=10:' but different excel sheets have different row limits, some have 50, some have 100, how can I amend this ?
# 40  
Old 07-04-2017
Quote:
Originally Posted by nans
Got it! So I have changed it to
Code:
if (dict_pos.has_key(cpos)):
NG.value = dict_pos[cpos]
else:
NG.value = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
    print NG

One last question, right now the code is reading rows which don't have values and ends up in a loop. For now I have the rows limited to 'while row_no <=10:' but different excel sheets have different row limits, some have 50, some have 100, how can I amend this ?
Awesome!
So the final step is just some tweaking in the "while <condition>" clause.
I had suggested the condition: "while row_no <= 10" so that you could see the "cpos" value for the first few rows (7 rows actually: row_no 4 through 10).
Otherwise, the "infinite loop" was printing lines too fast and it can be difficult to understand what's happening in that case.

The "infinite loop" problem still remains. The "row_no <= 10" was just a temporary patch.

Now, if you look at post # 36 in this thread, I had mentioned the working of a "while" statement.
The "while" statement repeats a set of actions while a condition is true.

So ask yourself this: in any worksheet, until when should I keep looking at the values of E, G, F cells and form the key and check against "dict_pos"?
If you were to update your worksheet manually, then up to what point would you go? When would you stop?
The answer to that question will decide what you want to put in the "while" condition.

If you ask me, I would do it manually until one of the cells E, G or F is empty. The moment one or more of them is empty, I would stop.
But I don't know what your requirements are.
Maybe you want to go until all the cells E, G, and F are empty.
So your condition will differ from mine.
Either way, you will have to work with using cell values in the while condition.
Have a look at post # 4 of this thread where I posted the first program.
In that program, I test the cell value in the while condition.
Use it to form your "while" condition.
Post your attempt or ask a question if you find it difficult to continue.
The Following User Says Thank You to durden_tyler For This Useful Post:
nans (07-05-2017)
# 41  
Old 07-05-2017
Great. I've changed it to 'while cell_pos.value:' and this works just perfect for my script.
Out of curiosity if I wanted to say go until all the cells E, G, and F are empty, how would I go about that ?

I really appreciate how tolerant you have been and extremely helpful since the past couple of weeks following up on my issues and helping me out with tutorials. Thank you, thank you very much and good luck and best wishes! :-)
# 42  
Old 07-05-2017
Quote:
Originally Posted by nans
Great. I've changed it to 'while cell_pos.value:' and this works just perfect for my script.
Out of curiosity if I wanted to say go until all the cells E, G, and F are empty, how would I go about that ?
...
Congratulations!
Code:
while cell_pos.value:

will run the "while" loop as long as cell_pos.value is True that is, it is non-empty.

In order to check "all of cells E, G, F are empty" we use the logical operator "and" to combine the three cell values:
1) cell_pos.value
2) cell_alt.value
3) cell_ref.value

Such a condition is called a "compound" condition.
So:

Code:
while cell_pos.value and cell_alt.value and cell_ref.value:

will enter the "while" loop as long as all of cells E, G, F are non-empty i.e. they have some value in them. The moment any one of the cells E, G, F is empty, the loop stops.

and
Code:
while cell_pos.value or cell_alt.value or cell_ref.value:

will enter the "while" loop as long as any one of cells E, G, F is non-empty. The moment all of cells E, G, F are empty, the loop stops.

Your program will loop through the rows checking only pos value.
So if, in a row, the pos value is non-empty but alt and/or ref values are empty, it will still form the key and try to check if the key exists in the dictionary dict_pos.
This may or may not work, depending on how the dictionary was formed from "scores.txt" text file.

Here's the complete program for your reference:

Code:
#!/usr/bin/python
import os
import csv
from openpyxl import load_workbook
from datetime import datetime
from collections import namedtuple

# Variables
sheet_directory = '<absolute_path_till_sheet_directory>'
txt_file = '<absolute_path_till_text_directory>/scores.txt'

def process_xl_sheets():
    # Process the text file and form the dictionary of positions
    dict_pos = {}
    Scores = namedtuple("Scores", ["POS", "ALT", "REF"])
    first_line = True
    with open(txt_file) as txt_filename:
        for line in txt_filename:
            if not line.strip():   # Skip empty lines
                continue
            if first_line:         # Skip the header
                first_line = False
                continue
            line = line.rstrip('\n')
            x = line.split('\t')
            cpos = Scores(POS=x[0], ALT=x[2], REF=x[1])
            dict_pos[cpos] = x[3]

    # Now process all Excel files
    pos_col_no = 'E'
    alt_col_no = 'G'
    ref_col_no = 'F'
    score_col_no = 'V'
    row_no = 4
    for sheet_root, sheet_dirs, sheet_files in os.walk(sheet_directory):
        for sheet_file in sheet_files:
            if sheet_file.endswith('.xlsx'):
                sheet_xl_file = os.path.join(sheet_root, sheet_file)
                wb = load_workbook(sheet_xl_file, data_only=True)
                ws = wb.get_sheet_by_name('raw_data')
                pos = ws[pos_col_no + str(row_no)].value
                alt = ws[alt_col_no + str(row_no)].value
                ref = ws[ref_col_no + str(row_no)].value
                while pos or alt or ref:
                    cpos = Scores(POS=str(pos), ALT=alt, REF=ref)
                    if cpos in dict_pos:
                        ws[score_col_no + str(row_no)].value = dict_pos[cpos]
                    else:
                        ws[score_col_no + str(row_no)].value = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                    row_no += 1
                    pos = ws[pos_col_no + str(row_no)].value
                    alt = ws[alt_col_no + str(row_no)].value
                    ref = ws[ref_col_no + str(row_no)].value
    wb.save(sheet_xl_file)

# Main section
process_xl_sheets()


Last edited by durden_tyler; 07-05-2017 at 12:10 PM.. Reason: You learn about the gaps in your thinking by reviewing what you wrote earlier... :)
The Following User Says Thank You to durden_tyler For This Useful Post:
nans (07-05-2017)
👤 Login to reply

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Python soap and string to .xlsx conversion timj123 Shell Programming and Scripting 8 06-09-2017 04:09 PM
FTP XLSX file from UNIX to windows Rathnakumar Shell Programming and Scripting 4 06-03-2014 11:32 AM
Appending = in particular column in csv file Divya1987 Shell Programming and Scripting 2 01-15-2013 08:50 AM
Appending column to rows unme Shell Programming and Scripting 7 07-25-2012 03:25 AM
Appending a column of numbers in ascending order to a text file evelibertine UNIX for Dummies Questions & Answers 11 03-13-2012 02:32 PM
Appending new column to existing files ida1215 Shell Programming and Scripting 8 01-10-2012 06:42 AM
Appending date value mmdd to first column in file kalyansid UNIX for Dummies Questions & Answers 1 11-11-2010 10:09 AM
appending column file f_o_555 Shell Programming and Scripting 4 03-05-2009 03:09 AM
Appending 'string' to file as first column. satyam_sat Shell Programming and Scripting 6 02-20-2009 04:15 AM
Appending a column in one file to the corresponding line in a second suzannef Shell Programming and Scripting 3 01-12-2009 04:42 PM


All times are GMT -4. The time now is 02:51 PM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
×
UNIX.COM Login
Username:
Password:  
Show Password