The code is very long, I have attached the relevant part where I am trying to copy the contents of the text file into the excel worksheet. But using this as well, I end up with a blank worksheet.
Code:
#!/usr/bin/python
import os
from openpyxl.reader.excel import load_workbook
import csv
from openpyxl.drawing.image import Image
import PIL
xl_directory = r'/home/test'
txt_directory = r'/home/test'
for xl_root, xl_dirs, xl_files in os.walk(xl_directory):
for xl_file in xl_files:
if xl_file.endswith('.xlsx'):
xl_abs_file = os.path.join(xl_root, xl_file)
wb = load_workbook(xl_abs_file, data_only=True)
ws = wb.get_sheet_by_name('Unannotated')
##clear the contents of the file
for row in ws['A4:U1000']:
for cell in row:
cell.value = None
image = Image('/home/logo3.jpg')
ws.add_image(image, 'A1')
## go through text file and write data on worksheet
for txt_root, txt_dirs, txt_files in os.walk(txt_directory):
for txt_file in txt_files:
if txt_file == xl_file.replace('xlsx', 'txt'):
with open(os.path.join(txt_root, txt_file)) as fh:
reader = csv.reader(fh, delimiter='\t')
[next(reader) for skip in range(1)]
for row in reader:
ws.append(row)
wb.save(xl_abs_file)
Here's some code that you'd like to try.
Code:
#!python
import os
from openpyxl import load_workbook
from datetime import datetime
# Variables
sheet_directory = r'<path_of_Excel_files>'
text_directory = r'<path_of_text_files>'
# Subroutines
def get_text_data(txt_filename):
dict_pos = {}
first_line = True
for text_root, text_dirs, text_files in os.walk(text_directory):
for text_file in text_files:
if text_file == txt_filename:
# A matching text file was found
fh = open(os.path.join(text_root, text_file))
for line in fh:
# Skip the header; read the data into the dictionary
if first_line:
first_line = False
continue
line = line.rstrip('\n')
x = line.split('\t')
dict_pos[x[0]] = x[3]
return dict_pos
def process_xl_sheets():
for sheet_root, sheet_dirs, sheet_files in os.walk(sheet_directory):
for sheet_file in sheet_files:
if sheet_file.endswith('.xlsx'):
# Read the corresponding text file from the text_directory and
# populate a dictionary of "Pos" values.
dpos = get_text_data(sheet_file.replace('.xlsx', '.txt'))
sheet_xl_file = os.path.join(sheet_root, sheet_file)
wb = load_workbook(sheet_xl_file, data_only=True)
ws = wb.get_sheet_by_name('rawdata')
# If you already know the columns that have the headers "Pos" and
# "Score", set them here. Otherwise, iterate through the first row
# to determine those columns.
pos_col_no = 'C'
score_col_no = 'F'
row_no = 2
cell = ws[pos_col_no + str(row_no)]
while cell.value:
if str(cell.value) in dpos:
ws[score_col_no + str(row_no)] = dpos[str(cell.value)]
else:
ws[score_col_no + str(row_no)] = 'Unknown_' + datetime.now().strftime("%B") + datetime.now().strftime("%Y")
row_no += 1
cell = ws[pos_col_no + str(row_no)]
wb.save(sheet_xl_file)
# Main section
process_xl_sheets()
A few things that come to mind as I look at the code again:
1) After "dpos" is assigned, you may want to do further processing only if dpos is not empty. Notice that dpos could if empty if a text file corresponding to an Excel file is not found. For such cases, it would be inefficient to work on the Excel spreadsheet at all.
2) In the "get_text_data" subroutine, you may want to process the first row and see if x[0] is "Pos" and x[3] is "Score". If not, then you can avoid processing the text file entirely.
3) If there is no worksheet called "rawdata", then continue to the next iteration of the loop.
3) If there are way too many Excels and text files (say hundreds or thousands or more), then you may want to first create a dictionary of Excel => text files and then iterate through the key/value pairs, processing them one-by-one. The existence of a file can be quickly checked using "os.path.isfile(<filename>)" - this avoids the unnecessary looping through the directory. In fact, coming to think of it, you can refactor the posted code and implement this concept to see if it improves the run time.
Last edited by durden_tyler; 06-22-2017 at 06:21 PM..
Reason: Added a few more thoughts on the code.
This User Gave Thanks to durden_tyler For This Post:
It appears that this has been asked and answered in similar fashions previously, but I am still unsure how to approach this.
I have two files containing user information:
fileA
ttim:/home/ttim:Tiny Tim:632
ppinto:/home/ppinto:Pam Pinto:633
fileB
ttim:xkfgjkd*&#^jhdfh... (3 Replies)
Hi , I have the below file with 6 columns.I want to append 'File1' as the 1 column to the file. i have the sample code .It is not working . can u please correct this or make new one .....
awk 'print {'File1',$1,$2,$3,$4,$5,$6}' Source_File> Result_File
Source_File:... (6 Replies)
Hi all,
I have two files with the same number of lines
the first file is a.dat and looks like
0.000 1.000
1.000 2.000
...
the fields are tab separated
the second file is b.dat and looks like
1.2347 0.546
2.3564 0.321
...
the fields are tab separated
I would like to have a file c.dat... (4 Replies)
Hi ,
I have a file with a running sequence number. I need to append a date value mmdd format on to the first column.
for e.g.: The file contains records as
001 abc
002 cde
003 edf
004 fgh
005 hik
The output should be
1111001 abc
1111002 cde
1111003 edf
1111004 ... (1 Reply)
Hi, i want to add another column to existing files containing strings and need to have the final output as a csv file. i have quite a number of files, each with varying number of rows and i need to append the string "test" for all the valid rows for each file. my sample raw files looks like this... (8 Replies)
I have a text file where I want to append a column of numbers in ascending orders.
Input:
57 abc
25 def
32 ghi
54 jkl
Output:57 abc
57 abc 1
25 def 2
32 ghi 3
54 jkl 4
How do I go about doing that? Thanks! (11 Replies)
Hi All,
Input.txt
KGO
Id "003"
..........
..........
Par "CPara"
BIN RECGET
Name "DIR_PATH"
Prompt "DIR_PATH"
END RECGET
KGO
............
..........
...............
KGO
Id "077"
..........
.......... (7 Replies)
Hi,
I have a requirement to append = in particular row in csv file. Data in csv is as follow:
row1,a,a,a
row2,b,b,b
row3,c,c,c
row4,d,d,d
csv should be modified at row3 and no. of columns are not fixed but rows are. output should be as:
row1,a,a,a
row2,b,b,b
row3,=c,=c,=c... (2 Replies)
Hi experts -
I'm relatively new to python, but I have an requirement to automate getting a file from a WebLib server using an API.
The file I'm requesting from this sever is an excel spreadsheet (.xlsx).
I get a valid response back via an xml doc from the server.
In this xml file I get... (8 Replies)
Source Code of the original script is down below please run the script and try to solve this problem
this is my data and I want it column wise
2019-03-20 13:00:00:000
2019-03-20 15:00:00:000
1
Operating System
LAB
0
1
1
1
1
1
1
1
1
1
0
1 (5 Replies)