Extracting data from specific rows and columns from multiple csv files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extracting data from specific rows and columns from multiple csv files
# 1  
Old 08-05-2015
Question Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format
eg file1
Code:
Experiment Name,XYZ_07/28/15,
Specimen Name,Specimen_001,
Tube Name, Control,
Record Date,7/28/2015 14:50,
$OP,XYZYZ,
GUID,abc,
Population,#Events,%Parent
All Events,10500,
P1,10071,95.9
Early Apoptosis,1113,11.1
Late Apoptosis,1091,10.8
Live,7856,78
Necrosis,11,0.1

file2
Code:
Experiment Name,ABC_07/28/15
Specimen Name,Specimen_001
Tube Name,OA_10uM
Record Date,"Jul 28, 2015 2:52:36 PM"
$OP,XYZYZ 
GUID,abc
Population,#Events,%Parent
All Events,12156,
P1,11821,97.2
Early Apoptosis,806,6.8
Late Apoptosis,901,7.6
Live,10087,85.3
Necrosis,27,0.2

I would like to extract Tube names ( 3rd row, 2nd column) and Last 4 rows {labels in column 1 only from 1st file and values in columns 3 from all files and write to to an output file.

expected output is :
Code:
Tube Name	Early Apoptosis	Late Apoptosis	Live	Necrosis
Control	11.1	10.8	78	0.1
OA_10uM	6.8	7.6	85.3	0.2

Can someone help me figure out the code ?

Thanks a ton
Moderator's Comments:
Mod Comment Please use CODE tags (not ICODE tags) for all multiline sample input, output, and code segments.

Last edited by Don Cragun; 08-06-2015 at 12:02 AM.. Reason: Change ICODE tags to CODE tags.
# 2  
Old 08-05-2015
Confused about ending commas. Some lines end in commas and others don't: this differs between file1 and file2. Are they actually that different or are these typos?

Are there constraints on the solution, must it be bash or can one use awk or perl?
# 3  
Old 08-05-2015
My bad sorry.
when I open these files on notepad, I only see line 8 to have the comma in the end
I don't mind any solution as long as it works.
Thanks
# 4  
Old 08-05-2015
Quote:
Originally Posted by pawannoel
My bad sorry.
when I open these files on notepad, I only see line 8 to have the comma in the end
I don't mind any solution as long as it works.
Thanks
Could you edit your initial post to display the lines in file1 and file2 as they show in your editor?
Do not use the icode tags, copy and paste, highlight it and click in the square code icon.
# 5  
Old 08-06-2015
ok
so I'm providing 2 new files : file3 and file4 with your suggestions
file3
Code:
Experiment Name,ABC_07/28/15
Specimen Name,Specimen_001
Tube Name,Unstained Control
Record Date,"Jul 28, 2015 2:42:33 PM"
$OP,XYZ 
GUID,abc
Population,#Events,%Parent
All Events,15282,
P1,14744,96.5
Early Apoptosis,71,0.5
Late Apoptosis,1,0.0
Live,14672,99.5
Necrosis,0,0.0

file4
Code:
Experiment Name,ABC_07/28/15
Specimen Name,Specimen_001
Tube Name,OA_10uM
Record Date,"Jul 28, 2015 2:52:36 PM"
$OP,KLM 
GUID,efg 
Population,#Events,%Parent
All Events,12156,
P1,11821,97.2
Early Apoptosis,806,6.8
Late Apoptosis,901,7.6
Live,10087,85.3
Necrosis,27,0.2

Thanks
# 6  
Old 08-06-2015
Try
Code:
awk -F, -v HD="Tube Name,Early Apoptosis,Late Apoptosis,Live,Necrosis" '
BEGIN           {for (N=i=split (HD, T); i>0; i--) SRCH[T[i]]
                 for (i=1; i<=N; i++) printf "%s\t", T[i]
                 print ""
                }
FNR==1 && L     {for (i=1; i<=N; i++) printf "%s\t", PR[T[i]]
                 print ""
                 delete PR
                }

$1 in SRCH      {PR[$1]=$NF
                 L=1
                }
END             {for (i=1; i<=N; i++) printf "%s\t", PR[T[i]]
                 print ""
                }
'  file[12]
Tube Name       Early Apoptosis Late Apoptosis  Live    Necrosis
Unstained Control       0.5     0.0     99.5    0.0
OA_10uM 6.8     7.6     85.3    0.2

# 7  
Old 08-10-2015
Code:
import os
from collections import deque

def parse_file(f,first=0):
	header=""
	line_3=""
	cnt=0
	q=deque([],maxlen=4)
	with open(f) as file:
		for line in file:
			line=line.replace("\n","")
			cnt+=1
			if cnt==3:
				line_3 = line
			else:
				# print(line)
				q.extend([line])
				
	items = line_3.split(",")
	if first:
		header = header+" "+items[0]
	content=items[1]
	for i in q:
		items = i.split(",")
		if first:
			header = header+" "+items[0]
		content = content+" "+items[2]
	if first:
		print(header)
	print(content)
	
file_cnt=0				
for file in os.listdir("leo"):
	file="leo/"+file
	if file_cnt==0:
		file_cnt+=1
		parse_file(file,first=1)
	else:
		parse_file(file)

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting specific files from multiple .tgz files

Hey, I have number of .tgz files and want to extract the file with the ending *results.txt from each one. I have tried for file in *.tgz; do tar --wildcards -zxf $file *results.txt; doneas well as list=$(ls *.tgz) for i in $list; do tar --wildcards -zxvf $i *.results.txt; done... (1 Reply)
Discussion started by: jfern
1 Replies

2. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Discussion started by: bkane3
5 Replies

3. Shell Programming and Scripting

Deleting all the fields(columns) from a .csv file if all rows in that columns are blanks

Hi Friends, I have come across some files where some of the columns don not have data. Key, Data1,Data2,Data3,Data4,Data5 A,5,6,,10,, A,3,4,,3,, B,1,,4,5,, B,2,,3,4,, If we see the above data on Data5 column do not have any row got filled. So remove only that column(Here Data5) and... (4 Replies)
Discussion started by: ks_reddy
4 Replies

4. UNIX for Dummies Questions & Answers

Extracting data between specific lines, multiple times

I need help extracting specific lines in a text file. The file looks like this: POSITION TOTAL-FORCE (eV/Angst) ----------------------------------------------------------------------------------- 1.86126 1.86973 1.86972 ... (14 Replies)
Discussion started by: captainalright
14 Replies

5. Shell Programming and Scripting

extracting columns falling within specific ranges for multiple files

Hi, I need to create weekly files from daily records stored in individual monthly filenames from 1999-2010. my sample file structure is like the ones below: daily record stored per month: 199901.xyz, 199902.xyz, 199903.xyz, 199904.xyz ...199912.xyz records inside 199901.xyz (original data... (4 Replies)
Discussion started by: ida1215
4 Replies

6. UNIX for Dummies Questions & Answers

Extracting data from PDF files into CSV file

Hi, I have several hundreds of PDFfiles number 01.pdf, 02.pdf, 03.pdf, etc in one folder. These are vey long documentd with a lot of information (text, tables, figures, etc). I need to extract the information asociated with one disease in particular (Varicella). The information I need to... (5 Replies)
Discussion started by: Xterra
5 Replies

7. Shell Programming and Scripting

Extracting columns from multiple files with awk

hi everyone! I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next} {print a, $2}' file1 file2 I added the file3, file4 and... (10 Replies)
Discussion started by: orcaja
10 Replies

8. UNIX for Dummies Questions & Answers

Extracting columns from multiple files with awk

hi everyone! I already posted it in scripts, I'm sorry, it's doubled I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next}... (1 Reply)
Discussion started by: orcaja
1 Replies

9. UNIX for Dummies Questions & Answers

Convert Rows to Columns Specific Data

I have this data M36 AREA INFORMATION MDN = 0485009346 ESN = H'15fda0b0 TYPE = HLR RESULT = NOK REASON = UNRECOGNIZED MIN COMPLETED AREA INFORMATION MDN = 0498044402 ESN = H'15fdac11 TYPE... (2 Replies)
Discussion started by: krabu
2 Replies

10. Shell Programming and Scripting

Perl - extracting data from .csv files

PROJECT: Extracting data from an employee timesheet. The timesheets are done in excel (for user ease) and then converted to .csv files that look like this (see color code key below): ,,,,,,,,,,,,,,,,,,, 9/14/2003,<-- Week Ending,,,,,,,,,,,,,,,,,, Craig Brennan,,,,,,,,,,,,,,,,,,,... (3 Replies)
Discussion started by: kregh99
3 Replies
Login or Register to Ask a Question