Appending a column in xlsx file using Python


 
Thread Tools Search this Thread
Top Forums Programming Appending a column in xlsx file using Python
# 22  
Old 06-27-2017
Quote:
Originally Posted by nans
...
So what would be the appropriate way of calling E4:G4:F4 -> V4
...
You form 'E4' first and get its cell value.
Then you form 'G4' and get its cell value.
And then do the same for 'F4'.
Then you form "Scores" namedtuple using the cell values of 'E4', 'G4' and 'F4'.
Then you check if that "Scores" namedtuple is a key in dictionary "dpos".
If it is, then get its value from "dpos" and paste in cell 'V4'; otherwise paste the "Unknown" string in 'V4'.
So, given:
Code:
pos_col_no = 'E'
alt_col_no = 'G'
ref_col_no = 'F'
row_no = 4

how do you form 'E4'?

Quote:
Originally Posted by nans
...
...
What do you mean by point 2 ? Will.it then be
dpos = get_text_data(txt_filename)
That is closely related to point 1.
Were you able to implement point 1 first?
The "function call" mentioned in point 1 looks the same as "function signature" mentioned in point 2.

Have a look at this tutorial to see if it helps:
1.7. Print Function, Part I — Hands-on Python Tutorial for Python 3.1
1.11. Defining Functions of your Own — Hands-on Python Tutorial for Python 3.1

Or looking at it another way; in the following statement:
Code:
dpos = get_text_data(txt_filename)

(1) what is the name of the function? and
(2) what is the name of the parameter?

Now, if you remove the parameter, then what will the above statement look like? (Have a look at the tutorial; it tells you what a function parameter is.)
This User Gave Thanks to durden_tyler For This Post:
# 23  
Old 06-28-2017
Thank you. Its just that this language is quite new and I am happy to learn it (just a bit slow!) So based on all your inputs, I have modified the code a bit more, now it prints the correct score based on 3 columns on the command line but
a) does not saves it to the excel sheet and
b) it does not read beyond row 4

So the code now is

Code:
#!/usr/bin/python

import sys
import os
from openpyxl import load_workbook
from datetime import datetime
from pandas import read_table
import csv
from collections import namedtuple
import csv


# Variables
sheet_directory = r'/home/test'

def process_xl_sheets():
    dict_pos = {}
    Scores = namedtuple("Scores", ["POS", "ALT", "REF"])
    first_line = True
    with open('/home/test/scores.txt') as txt_filename:
        for line in txt_filename:
            if first_line:
                first_line = False
                continue
            line = line.rstrip('\n')
            x = line.split('\t')
            cpos = Scores(POS=x[0], ALT=x[2], REF=x[1])
            dict_pos[cpos] = x[3]
            #print dict_pos ##prints

    for sheet_root, sheet_dirs, sheet_files in os.walk(sheet_directory):
        for sheet_file in sheet_files:
            if sheet_file.endswith('.xlsx'):
                #print(sheet_file) ##prints
                sheet_xl_file = os.path.join(sheet_root, sheet_file)
                wb = load_workbook(sheet_xl_file, data_only=True)
                ws = wb.get_sheet_by_name('raw_data')
                pos_col_no = 'E'
                alt_col_no = 'G'
                ref_col_no = 'F'
                score_col_no = 'V'
                NG = score_col_no
                row_no = 4
                # cell = ws[pos_col_no  + str(row_no)]
                cell_pos = ws[pos_col_no + str(row_no)]
                cell_ref = ws[ref_col_no + str(row_no)]
                cell_alt = ws[alt_col_no + str(row_no)]
                NG = ws[score_col_no + str(row_no)] = 'Unknown_' + datetime.now().strftime("%B")
                cpos = Scores(POS=cell_pos.value, ALT=cell_alt.value, REF=cell_ref.value)
                print cpos ##prints only row E4:G4:F4 and not all
                #print dict_pos ##prints all the scores
                if (dict_pos.has_key(cpos)):
                        NG = dict_pos[cpos]
                        print dict_pos[cpos] ##prints the actual score
                else:
                        NG = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                        row_no += 1
                        cpos = Scores(POS=cell_pos.value, ALT=cell_alt.value, REF=cell_ref.value)
    wb.save(sheet_xl_file)

process_xl_sheets()

and the command line results are

Code:
/usr/bin/python2.7 /home/test/new_dict.py

##reads only row 4 
Scores(POS=u'73', ALT=u'G', REF=u'A')

##prints all values from 'score.txt' 
{Scores(POS='2171', ALT='C', REF='T'): '5', Scores(POS='73', ALT='G', REF='A'): '11', Scores(POS='114', ALT='T', REF='C'): '1', Scores(POS='2080', ALT='C', REF='T'): '4', Scores(POS='1189', ALT='C', REF='T'): '1'}
 
##returns the value from score.txt against row 4 of xlsx but does not save it to excel 
11 

 Process finished with exit code 0


Last edited by nans; 06-28-2017 at 08:38 AM..
# 24  
Old 06-28-2017
Quote:
Originally Posted by nans
...
now it prints the correct score based on 3 columns on the command line but
a) does not saves it to the excel sheet and
b) it does not read beyond row 4
...
...
b) You did not tell Python to go beyond row 4! That's why it did not read beyond row 4.

The earlier iterations of this program had the "while" statement in them, which you removed. That "while" statement was for going beyond row 4.

What you want to do is something like the following pseudo-code:

Code:
 Go to row 4.
 Get the values of E4, G4, F4 and cpos.
 If cpos exists as a key in dict_pos, then print the value otherwise print the "unknown" string in cell V4.

 Go to row 5.
 Get the values of E5, G5, F5 and cpos.
 If cpos exists as a key in dict_pos, then print the value otherwise print the "unknown" string in cell V5.

 Go to row 6.
 Get the values of E6, G6, F6 and cpos.
 If cpos exists as a key in dict_pos, then print the value otherwise print the "unknown" string in cell V6.

 Go to row 7.
 Get the values of E7, G7, F7 and cpos.
 If cpos exists as a key in dict_pos, then print the value otherwise print the "unknown" string in cell V7.

 ...
 Keep doing this until you reach a row whose E, G and F columns are empty.

This User Gave Thanks to durden_tyler For This Post:
# 25  
Old 06-29-2017
I did use the while statement before but that just printed the value of the score continously without stopping. This is what I did

Code:
            NG = ws[score_col_no + str(row_no)] = 'Unknown_' + datetime.now().strftime("%B")
            cpos = Scores(POS=cell_pos.value, ALT=cell_alt.value, REF=cell_ref.value)
            print cpos ##prints only row E4:G4:F4 and not all
            while cpos:
                if (dict_pos.has_key(cpos)):
                    NG = dict_pos[cpos]
                    print dict_pos[cpos] ##prints the actual score
                else:
                    NG = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                    row_no += 1
                    cpos = Scores(POS=cell_pos.value, ALT=cell_alt.value, REF=cell_ref.value)
            wb.save(sheet_xl_file)

process_xl_sheets()

and there was no value on the excel sheet saved and also the code did not stop

Code:
/usr/bin/python2.7 /home/test/new_dict.py 
 ##reads only row 4  
Scores(POS=u'73', ALT=u'G', REF=u'A')  
##prints all values from 'score.txt'  
{Scores(POS='2171', ALT='C', REF='T'): '5', Scores(POS='73', ALT='G', REF='A'): '11', Scores(POS='114', ALT='T', REF='C'): '1', Scores(POS='2080', ALT='C', REF='T'): '4', Scores(POS='1189', ALT='C', REF='T'): '1'}   
##returns the value from score.txt against row 4 of xlsx but does not save it to excel  11
11
11
11
11
11
11
11
11
....

# 26  
Old 06-29-2017
Quote:
Originally Posted by nans
...
...
I did use the while statement before but that just printed the value of the score continously without stopping. ...
That's because the "while" statement went into what is called an "infinite loop".
That's because "cpos" is always True everytime it is checked.
[ It means that "cpos" always has a value everytime it is checked and that value is Scores(POS='73', ALT='G', REF='A') ]

You want "cpos" to change in every iteration of the "while" loop.
And you want the row_no to increment in every iteration of the "while" loop.

You are incrementinng row_no and changing cpos within the else branch.
You need to increment row_no and change cpos within the while statement.
In Python, the indentation of the statement decides what is done "within" what.

---------- Post updated at 02:12 PM ---------- Previous update was at 10:33 AM ----------

Here's how your code is being executed; I have added line numbers at the left to help track the path of execution:

Code:
 1    NG = ws[score_col_no + str(row_no)] = 'Unknown_' + datetime.now().strftime("%B")
 2    cpos = Scores(POS=cell_pos.value, ALT=cell_alt.value, REF=cell_ref.value)
 3    print cpos ##prints only row E4:G4:F4 and not all
 4    while cpos:
 5        if (dict_pos.has_key(cpos)):
 6            NG = dict_pos[cpos]
 7            print dict_pos[cpos] ##prints the actual score
 8        else:
 9            NG = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
10            row_no += 1
11            cpos = Scores(POS=cell_pos.value, ALT=cell_alt.value, REF=cell_ref.value)
12    wb.save(sheet_xl_file)
  
Line 1 => Set the value of NG to 'Unknown_June'.
Line 2 => Set the value of cpos to Scores(POS='73', ALT='G', REF='A')
Line 3 => Print the value of cpos
  
Line 4 => Is cpos True? Yes it is.
Line 5 => Does dict_pos have the key cpos = Scores(POS='73', ALT='G', REF='A')? Yes it does.
Line 6 => Set the value of NG to dict_pos[cpos] = 11
Line 7 => Print the value of dict_pos[cpos] = 11
Line 8 => Disregard the "else" branch and all 3 statements within it (lines 9, 10, 11) because "if" branch was processed. The control goes back to Line 4.
  
Line 4 => Is cpos True? Yes it is.
Line 5 => Does dict_pos have the key cpos = Scores(POS='73', ALT='G', REF='A')? Yes it does.
Line 6 => Set the value of NG to dict_pos[cpos] = 11
Line 7 => Print the value of dict_pos[cpos] = 11
Line 8 => Disregard the "else" branch and all 3 statements within it (lines 9, 10, 11) because "if" branch was processed. The control goes back to Line 4.
  
Line 4 => Is cpos True? Yes it is.
Line 5 => Does dict_pos have the key cpos = Scores(POS='73', ALT='G', REF='A')? Yes it does.
Line 6 => Set the value of NG to dict_pos[cpos] = 11
Line 7 => Print the value of dict_pos[cpos] = 11
Line 8 => Disregard the "else" branch and all 3 statements within it (lines 9, 10, 11) because "if" branch was processed. The control goes back to Line 4.
  
Line 4 => Is cpos True? Yes it is.
Line 5 => Does dict_pos have the key cpos = Scores(POS='73', ALT='G', REF='A')? Yes it does.
Line 6 => Set the value of NG to dict_pos[cpos] = 11
Line 7 => Print the value of dict_pos[cpos] = 11
Line 8 => Disregard the "else" branch and all 3 statements within it (lines 9, 10, 11) because "if" branch was processed. The control goes back to Line 4.
  
Line 4 => Is cpos True? Yes it is.
Line 5 => Does dict_pos have the key cpos = Scores(POS='73', ALT='G', REF='A')? Yes it does.
Line 6 => Set the value of NG to dict_pos[cpos] = 11
Line 7 => Print the value of dict_pos[cpos] = 11
Line 8 => Disregard the "else" branch and all 3 statements within it (lines 9, 10, 11) because "if" branch was processed. The control goes back to Line 4.
  
...
...
...
And so on, ad infinitum...
...
...

You have to give a chance for Line 10 and 11 to execute.
This User Gave Thanks to durden_tyler For This Post:
# 27  
Old 06-30-2017
I'm not quite sure what is going wrong, I have tried this as well

Code:
                 while cpos:
                    if (dict_pos.has_key(cpos)):
                        NG = dict_pos[cpos]
                        print dict_pos[cpos]
                    else:
                        NG = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                    row_no += 1
                    cpos = Scores(POS=cell_pos.value, ALT=cell_alt.value, REF=cell_ref.value)
                wb.save(sheet_xl_file)

it still goes into an infinite loop.
Also, why does it not save '11' in the excel workbook for the first row and instead prints 'unknown' even though it shows the correct score on the terminal?

Last edited by nans; 06-30-2017 at 05:27 AM..
# 28  
Old 06-30-2017
Quote:
Originally Posted by nans
I'm not quite sure what is going wrong, I have tried this as well

Code:
                 while cpos:
                    if (dict_pos.has_key(cpos)):
                        NG = dict_pos[cpos]
                        print dict_pos[cpos]
                    else:
                        NG = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
                    row_no += 1
                    cpos = Scores(POS=cell_pos.value, ALT=cell_alt.value, REF=cell_ref.value)
                wb.save(sheet_xl_file)

it still goes into an infinite loop.
Also, why does it not save '11' in the excel workbook for the first row and instead prints 'unknown' even though it shows the correct score on the terminal?
Print the value of "cpos" inside the "while" loop.
It is the same every time.
That's the reason for the infinite loop.
You need the value of "cpos" to change in every iteration of the loop.
This User Gave Thanks to durden_tyler For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to insert data into black column( Secound Column ) in excel (.XLSX) file using shell script?

Source Code of the original script is down below please run the script and try to solve this problem this is my data and I want it column wise 2019-03-20 13:00:00:000 2019-03-20 15:00:00:000 1 Operating System LAB 0 1 1 1 1 1 1 1 1 1 0 1 (5 Replies)
Discussion started by: Shubham1182
5 Replies

2. Shell Programming and Scripting

Python soap and string to .xlsx conversion

Hi experts - I'm relatively new to python, but I have an requirement to automate getting a file from a WebLib server using an API. The file I'm requesting from this sever is an excel spreadsheet (.xlsx). I get a valid response back via an xml doc from the server. In this xml file I get... (8 Replies)
Discussion started by: timj123
8 Replies

3. Shell Programming and Scripting

Appending = in particular column in csv file

Hi, I have a requirement to append = in particular row in csv file. Data in csv is as follow: row1,a,a,a row2,b,b,b row3,c,c,c row4,d,d,d csv should be modified at row3 and no. of columns are not fixed but rows are. output should be as: row1,a,a,a row2,b,b,b row3,=c,=c,=c... (2 Replies)
Discussion started by: Divya1987
2 Replies

4. Shell Programming and Scripting

Appending column to rows

Hi All, Input.txt KGO Id "003" .......... .......... Par "CPara" BIN RECGET Name "DIR_PATH" Prompt "DIR_PATH" END RECGET KGO ............ .......... ............... KGO Id "077" .......... .......... (7 Replies)
Discussion started by: unme
7 Replies

5. UNIX for Dummies Questions & Answers

Appending a column of numbers in ascending order to a text file

I have a text file where I want to append a column of numbers in ascending orders. Input: 57 abc 25 def 32 ghi 54 jkl Output:57 abc 57 abc 1 25 def 2 32 ghi 3 54 jkl 4 How do I go about doing that? Thanks! (11 Replies)
Discussion started by: evelibertine
11 Replies

6. Shell Programming and Scripting

Appending new column to existing files

Hi, i want to add another column to existing files containing strings and need to have the final output as a csv file. i have quite a number of files, each with varying number of rows and i need to append the string "test" for all the valid rows for each file. my sample raw files looks like this... (8 Replies)
Discussion started by: ida1215
8 Replies

7. UNIX for Dummies Questions & Answers

Appending date value mmdd to first column in file

Hi , I have a file with a running sequence number. I need to append a date value mmdd format on to the first column. for e.g.: The file contains records as 001 abc 002 cde 003 edf 004 fgh 005 hik The output should be 1111001 abc 1111002 cde 1111003 edf 1111004 ... (1 Reply)
Discussion started by: kalyansid
1 Replies

8. Shell Programming and Scripting

appending column file

Hi all, I have two files with the same number of lines the first file is a.dat and looks like 0.000 1.000 1.000 2.000 ... the fields are tab separated the second file is b.dat and looks like 1.2347 0.546 2.3564 0.321 ... the fields are tab separated I would like to have a file c.dat... (4 Replies)
Discussion started by: f_o_555
4 Replies

9. Shell Programming and Scripting

Appending 'string' to file as first column.

Hi , I have the below file with 6 columns.I want to append 'File1' as the 1 column to the file. i have the sample code .It is not working . can u please correct this or make new one ..... awk 'print {'File1',$1,$2,$3,$4,$5,$6}' Source_File> Result_File Source_File:... (6 Replies)
Discussion started by: satyam_sat
6 Replies

10. Shell Programming and Scripting

Appending a column in one file to the corresponding line in a second

It appears that this has been asked and answered in similar fashions previously, but I am still unsure how to approach this. I have two files containing user information: fileA ttim:/home/ttim:Tiny Tim:632 ppinto:/home/ppinto:Pam Pinto:633 fileB ttim:xkfgjkd*&#^jhdfh... (3 Replies)
Discussion started by: suzannef
3 Replies
Login or Register to Ask a Question