Sponsored Content
Top Forums Shell Programming and Scripting How to parse a huge 600MB zipped file? Post 302668659 by DeltaComp on Monday 9th of July 2012 06:57:41 PM
Old 07-09-2012
How to parse a huge 600MB zipped file?

I'm new to Unix, trying to parse a huge 600MB zipped file...
I need to bzcat this file once and do some calculations (word count) on the lines based on certain criteria (see script)
the correct result/output should be:
column1=6
column2=4
the problem is that I'm getting column2=0 (see results)
could you please help
thanks


source file: test.test.bz2
Code:
1,1,2,3
1,2,1,2
2,1,2,2
3,1,1,1
1,2,1,1
2,2,2,2
3,2,2,2

Script: test.bsh
Code:
 
#!/bin/bash
bzcat test.test.bz2 |
while read line
do
column1=$(awk '{FS=","} {print $1}' | uniq | wc -l)
                echo $column1
column2=$(awk '{FS=","} {print $2}' | uniq | wc -l)
                echo $column2
done

Results
Code:
bash -vx test.bsh

#!/bin/bash
bzcat test.test.bz2 |
while read line
do
column1=$(awk '{FS=","} {print $1}' | uniq | wc -l)
                echo $column1
column2=$(awk '{FS=","} {print $2}' | uniq | wc -l)
                echo $column2
done
+ bzcat test.test.bz2
+ read line
awk '{FS=","} {print $1}' | uniq | wc -l
++ awk '{FS=","} {print $1}'
++ uniq
++ wc -l
+ column1='       6'
+ echo 6
6
awk '{FS=","} {print $2}' | uniq | wc -l
++ awk '{FS=","} {print $2}'
++ uniq
++ wc -l
+ column2='       0'
+ echo 0
0
+ read line


Last edited by fpmurphy; 07-09-2012 at 08:33 PM.. Reason: code tags please!
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

zipped or unzipped file

Is there a way you can tell if a file is still zipped or it's unzipped I have a file called ssss.zip and I would like to know if this file is still zipped or if it's unzipped? I'm on IBM AIX/RS6000 (3 Replies)
Discussion started by: ted
3 Replies

2. UNIX for Dummies Questions & Answers

sendind a zipped file via email

Hi, I was not sure if I can do this. Suppose I have a file under /tmp Suppose the file is called any_11_52.txt Fisrt QUESTION??? If I zip this file using gzip will the user be able to unzip it , if I send it as an attachment in an email. Secondly is there a command by which we can... (2 Replies)
Discussion started by: rooh
2 Replies

3. Shell Programming and Scripting

How to search a pattern inside a zipped file ie (.gz file) with out unzipping it

How to search a pattern inside a zipped file ie (.gz file) with out unzipping it? using grep command.. Bit urgent.. pls..help me (2 Replies)
Discussion started by: senraj01
2 Replies

4. UNIX for Dummies Questions & Answers

how to check if file is zipped

I have a script that grabs files from directory , zips and moves them somewhere else. It works fine except the case when files it grabs are already zipped. Then it trys to zip it again which does not make sence. How can I check before zipping if file is already zipped? thanks in advance (3 Replies)
Discussion started by: arushunter
3 Replies

5. UNIX for Dummies Questions & Answers

reading a zipped file without unzipping it?

Dear all, I would like to ask how i can read a zipped file (file.gz) without actually unzipping it? i think there is a way to do so but i can't remember it.. can anyone help? thanks in advance.. (1 Reply)
Discussion started by: marwan
1 Replies

6. UNIX for Dummies Questions & Answers

Zipped tar file is corrupt

Hello, I am currently dumping 30-40 reports on a Unix folder located here /home/apps/reports/prode/excel I use K-shell to do this task. In that, I use the gzip command to compress these files. I want to be able to use a tar command to first load the entire directory into one file then gzip that... (2 Replies)
Discussion started by: Pramodini Rode
2 Replies

7. Solaris

How can I tell if a file is zipped or not?

SunOS xxxxxx 5.10 Generic_142900-15 sun4v sparc SUNW,T5240 We receive files that are sometimes zipped, but the file may not have the .gz or other extention that would indicated that the file is zipped. Is there a unix "test" command that I could use or something similar? Thanks in advance (7 Replies)
Discussion started by: Harleyrci
7 Replies

8. Shell Programming and Scripting

FTP'ing the zipped file

Hi, I need to have a shell script that FTP's a zipped file from a particular location. I have some path and inside that path i will have folders like x_timestamp and inside x_timestamp there may many folders based upon events like y_111,y_222,y_333.Inside each event there will be another... (3 Replies)
Discussion started by: weknowd
3 Replies

9. Shell Programming and Scripting

awk to parse huge files

Hello All, I have a situation as below: (1) Read a source file (a single file of 1.2 million rows in it ) (2) Read Destination files one by one and replace the content ( few fields in it ) with the corresponding matching field from source file. I tried as below: ( please note I am not... (4 Replies)
Discussion started by: panyam
4 Replies

10. Shell Programming and Scripting

Work with huge Zipped files

Hello dear members, I have one general and one specific question which I will be very grateful if you could help me with them. Let's start with my general question: 1. I am working on cluster computer shared with other people and I need to manipulate a big zipped text file of 13 GB. There is... (1 Reply)
Discussion started by: Homa
1 Replies
UNIQ(1) 						    BSD General Commands Manual 						   UNIQ(1)

NAME
uniq -- report or filter out repeated lines in a file SYNOPSIS
uniq [-c | -d | -u] [-i] [-f num] [-s chars] [input_file [output_file]] DESCRIPTION
The uniq utility reads the specified input_file comparing adjacent lines, and writes a copy of each unique input line to the output_file. If input_file is a single dash ('-') or absent, the standard input is read. If output_file is absent, standard output is used for output. The second and succeeding copies of identical adjacent input lines are not written. Repeated lines in the input will not be detected if they are not adjacent, so it may be necessary to sort the files first. The following options are available: -c Precede each output line with the count of the number of times the line occurred in the input, followed by a single space. -d Only output lines that are repeated in the input. -f num Ignore the first num fields in each input line when doing comparisons. A field is a string of non-blank characters separated from adjacent fields by blanks. Field numbers are one based, i.e., the first field is field one. -s chars Ignore the first chars characters in each input line when doing comparisons. If specified in conjunction with the -f option, the first chars characters after the first num fields will be ignored. Character numbers are one based, i.e., the first character is character one. -u Only output lines that are not repeated in the input. -i Case insensitive comparison of lines. ENVIRONMENT
The LANG, LC_ALL, LC_COLLATE and LC_CTYPE environment variables affect the execution of uniq as described in environ(7). EXIT STATUS
The uniq utility exits 0 on success, and >0 if an error occurs. COMPATIBILITY
The historic +number and -number options have been deprecated but are still supported in this implementation. SEE ALSO
sort(1) STANDARDS
The uniq utility conforms to IEEE Std 1003.1-2001 (``POSIX.1'') as amended by Cor. 1-2002. HISTORY
A uniq command appeared in Version 3 AT&T UNIX. BSD
July 3, 2004 BSD
All times are GMT -4. The time now is 02:18 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy