Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Extracting data from many compressed files Post 302411785 by Boltzmann on Friday 9th of April 2010 12:18:30 PM
Old 04-09-2010
Extracting data from many compressed files

I have a large number (50,000) of pretty large compressed files and I need only certain lines of data from them (each relevant line contains a certain key word). Each file contains 300 such lines. The individual file names are indexed by file number (file_name.1, file_name.2, ... , file_name.50000).

So, I need to uncompress, pull out relevant lines and write to combined_file_name and compress again. I wrote the following bash script to do this:

Code:
i=1
num=50000
while [ $i -le $num ]
  do
  bunzip2 file_name.$i.bz2
  grep key_word file_name.$i >> combined_file_name
  bzip2 file_name.$i
  i=$[$i+1]
done

What I am wondering is whether there is a better/faster way to accomplish this.

Any advice would be much appreciated.

Thank you

---------- Post updated at 12:18 PM ---------- Previous update was at 11:50 AM ----------

While reading another post on here, I just discovered the bzcat command.

I am guessing something like this would be faster:

Code:
i=1
num=50000
while [ $i -le $num ]
  do
  bzcat file_name.$i.bz2 | grep key_word >> combined_file_name
  i=$[$i+1]
done

Is there something even better?

Thanks again in advance.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl - extracting data from .csv files

PROJECT: Extracting data from an employee timesheet. The timesheets are done in excel (for user ease) and then converted to .csv files that look like this (see color code key below): ,,,,,,,,,,,,,,,,,,, 9/14/2003,<-- Week Ending,,,,,,,,,,,,,,,,,, Craig Brennan,,,,,,,,,,,,,,,,,,,... (3 Replies)
Discussion started by: kregh99
3 Replies

2. Shell Programming and Scripting

extracting data from files..

frnds, I m having prob woth doing some 2-3 task simultaneously... what I want is... I have lots ( lacs ) of files in a dir... I want.. these info from arround 2-3 months files filename convention is - abc20080403sdas.xyz ( for todays files ) I want 1. total no of files for 1 dec... (1 Reply)
Discussion started by: clx
1 Replies

3. Shell Programming and Scripting

Ucompress the compressed data

Hi, I have a file that has got compressed data. I would want to uncompress the packed decimal data(not the file). is there a way to do that in ksh? (6 Replies)
Discussion started by: ahmedwaseem2000
6 Replies

4. UNIX for Dummies Questions & Answers

Finding and Extracting uniq data in multiple files

Hi, I have several files that look like this: File1.txt Data1 Data2 Data20 File2.txt Data1 Data5 Data10 File3.txt Data1 Data2 Data17 File4.txt (6 Replies)
Discussion started by: Fahmida
6 Replies

5. Shell Programming and Scripting

awk - extracting data from a series of files

Hi, I am trying to extract data from multiple output files. I am able to extract the data from a single output file by using the following awk commands: awk '/ test-file*/{print;m=0}' out1.log > out1a.txt awk '/ test-string/{m=1;c=0}m&&++c==3{print $2 " " $3 " " $4 ;m=0}' out1.log >... (12 Replies)
Discussion started by: p_sun
12 Replies

6. UNIX for Dummies Questions & Answers

Extracting data from PDF files into CSV file

Hi, I have several hundreds of PDFfiles number 01.pdf, 02.pdf, 03.pdf, etc in one folder. These are vey long documentd with a lot of information (text, tables, figures, etc). I need to extract the information asociated with one disease in particular (Varicella). The information I need to... (5 Replies)
Discussion started by: Xterra
5 Replies

7. Shell Programming and Scripting

Extracting Delimiter 'TAG' Data From log files

Hi I am trying to extract data from within a log file and output format to a new file for further manipulation can someone provide script to do this? For example I have a file as below and just want to extract all delimited variances of tag 32=* up to the delimiter "|" and output to a new file... (2 Replies)
Discussion started by: Buddyluv
2 Replies

8. Programming

Python script for extracting data using two files

Hello, I have two files. File 1 is a list of interested IDs Ex1 Ex2 Ex3File 2 is the original file with over 8000 columns and 20 millions rows and is a compressed file .gz Ex1 xx xx xx xx .... Ex2 xx xx xx xx .... Ex2 xx xx xx xx ....Now I need to extract the information for all the IDs of... (4 Replies)
Discussion started by: nans
4 Replies

9. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late... (6 Replies)
Discussion started by: pawannoel
6 Replies

10. Shell Programming and Scripting

Extracting part of data from files

Hi All, I have log files as below. log1.txt <table name="content_analyzer" primary-key="id"> <type="global" /> </table> <table name="content_analyzer2" primary-key="id"> <type="global" /> </table> Time taken: 1.008 seconds ID = gd54321bbvbvbcvb <table name="content_analyzer"... (7 Replies)
Discussion started by: ROCK_PLSQL
7 Replies
Gtk2::PrintSettings(3pm)				User Contributed Perl Documentation				  Gtk2::PrintSettings(3pm)

NAME
Gtk2::PrintSettings - wrapper for GtkPrintSettings HIERARCHY
Glib::Object +----Gtk2::PrintSettings METHODS
printsettings = Gtk2::PrintSettings->new printsettings = Gtk2::PrintSettings->new_from_file ($file_name) o $file_name (localized file name) May croak with a Glib::Error in $@ on failure. Since: gtk+ 2.12 printsettings = Gtk2::PrintSettings->new_from_key_file ($key_file, $group_name) o $key_file (Glib::KeyFile) o $group_name (string or undef) May croak with a Glib::Error in $@ on failure. Since: gtk+ 2.12 $settings->foreach ($func, $data=undef) o $func (scalar) o $data (scalar) string or undef = $settings->get ($key) o $key (string) $settings->set ($key, $value) o $key (string) o $value (string or undef) boolean = $settings->has_key ($key) o $key (string) $settings->load_file ($file_name) o $file_name (string) May croak with a Glib::Error in $@ on failure. Since: gtk+ 2.14 $settings->load_key_file ($key_file, $group_name) o $key_file (Glib::KeyFile) o $group_name (string or undef) May croak with a Glib::Error in $@ on failure. Since: gtk+ 2.14 $settings->to_file ($file_name) o $file_name (localized file name) May croak with a Glib::Error in $@ on failure. Since: gtk+ 2.12 $settings->to_key_file ($key_file, $group_name) o $key_file (Glib::KeyFile) o $group_name (string or undef) Since: gtk+ 2.12 $settings->unset ($key) o $key (string) SEE ALSO
Gtk2, Glib::Object COPYRIGHT
Copyright (C) 2003-2011 by the gtk2-perl team. This software is licensed under the LGPL. See Gtk2 for a full notice. perl v5.14.2 2012-05-27 Gtk2::PrintSettings(3pm)
All times are GMT -4. The time now is 03:28 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy