Sponsored Content
Full Discussion: Playing with Volume of data
Top Forums Shell Programming and Scripting Playing with Volume of data Post 302330241 by darshanw on Tuesday 30th of June 2009 02:56:37 PM
Old 06-30-2009
Question Playing with Volume of data

Quick problem statement:
How to read/extract data from a big-big file.

Details:
We are having a big big problemo in the work we are working at. We are using solaris plarform E25.
There is a very big file created somewhere around 200 million records anad the lenght of each record is more than 1000 columns. The data in these columns is separated by semicolon.
Code:
Sample File:
01;1;;0001;123;;;;ZBCA10;;;;;;;;;20060116;99991
 ;/;/;/;123;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;
 ;/;/;/;123;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;123;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
01;1;;0001;421876;;;;BCA030;BCA010;;;;;;;;20060502
 ;/;/;/;421876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;421876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;421876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
01;1;;0001;42187;;;;BCA030;BCA010;;;;;;;;20060502
01;1;;0001;4216;;;;BCA030;BCA010;;;;;;;;20060502
01;1;;0001;4876;;;;BCA030;BCA010;;;;;;;;20060502
01;1;;0001;21876;;;;BCA030;BCA010;;;;;;;;20060502
01;1;;0001;876;;;;BCA030;BCA010;;;;;;;;20060502
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/
 ;/;/;/;876;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/;/

About file:
the start of the file always starts with a 01 or a " " space. The records are more like header-details. the row starting with 01 - implies a header line. the ones below it are details. Data columns are separated by semicolon. Data Column 5 will remain same between header and detail rows.
What I need to do:
1) extract a bunch of header-detail records. For example I need to every 50000 header-detail kind of data rows into a file.

I tired the SED command
sed -n 100,200002p -f fileabc.txt
However the performance is not upto the mark. the problem with sed is that even if it is instructed to copy only 100 to 200002 rows, it still scans the entire file.
When I tried SED with the entire file it took couple of days to run. Thats too much. I need better options?
Is there a way to make this operation run in parallel?
Is there a shell command which helps to copy only the specific rows and down not scan the entire file?

2) I will let you know the second question later.

Thanks,
darshanw
 

10 More Discussions You Might Find Interesting

1. Solaris

How to resize mirror volume in veritas volume manager 3.5 on Solaris 9 OE

Hi all, I have a problem with vxvm volume which is mirror with two disks. when i am try to increase file system, it is throwing an ERROR: can not allocate 5083938 blocks, ERROR: can not able to run vxassist on this volume. Please find a sutable solutions. Thanks and Regards B. Nageswar... (0 Replies)
Discussion started by: nageswarb
0 Replies

2. AIX

Basic Filesystem / Physical Volume / Logical Volume Check

Hi! Can anyone help me on how I can do a basic check on the Unix filesystems / physical volumes and logical volumes? What items should I check, like where do I look at in smit? Or are there commands that I should execute? I need to do this as I was informed by IBM that there seems to be... (1 Reply)
Discussion started by: chipahoys
1 Replies

3. Shell Programming and Scripting

how to manipulate with lines while playing with data

hello everyone, well I have a file which contains data, I want to add the data on hourly basis, like my file contains data for 24 hours, (so a total of 1440 ) lines. Now i want to add the data on hourly basis to get average values. like if I use (head) command it is ok for first go, but... (5 Replies)
Discussion started by: jojo123
5 Replies

4. Shell Programming and Scripting

Creating Header & Trailer for bulk volume data file

Hi all, I have a requirement to create a Header &Trailer for a file which is having 20 millions of records. If I use the following method, i think it will take more time. cat "Header"> file1.txt cat Data_File>>file1.txt cat "Trailer">>file1.txt since second CAT command has to read all... (4 Replies)
Discussion started by: Raamc
4 Replies

5. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Background ------------- The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files. File-1 ------ Contains 50,000 rows with 2 fields in each row, separated by pipe. Row structure is like Object_Id|Object_Name, as following: 111|XXX 222|YYY 333|ZZZ ... (6 Replies)
Discussion started by: Souvik
6 Replies

6. UNIX for Dummies Questions & Answers

Confusion Regarding Physical Volume,Volume Group,Logical Volume,Physical partition

Hi, I am new to unix. I am working on Red Hat Linux and side by side on AIX also. After reading the concepts of Storage, I am now really confused regarding the terminologies 1)Physical Volume 2)Volume Group 3)Logical Volume 4)Physical Partition Please help me to understand these concepts. (6 Replies)
Discussion started by: kashifsd17
6 Replies

7. AIX

LTO5 Catridge 1.5 TB Natvie capactiy unable to hold 1.44TB data in one volume

Hi, LTO5 Data cartridge has 1.5 TB (1500GB) native capacity but when we are taking our 1.44 TB (1475 GB) filesystem backup using backupby filename on these data cartridges it does not fully finish on one cartridge instead it requires another volume to backup the remaining files. I am unable to... (11 Replies)
Discussion started by: m_raheelahmed
11 Replies

8. Shell Programming and Scripting

Need a ready Shell script to validate a high volume data file

Hi, I am looking for a ready shell script that can help in loading and validating a high volume (around 4 GB) .Dat file . The data in the file has to be validated at each of its column, like the data constraint on each of the data type on each of its 60 columns and also a few other constraints... (2 Replies)
Discussion started by: Guruprasad
2 Replies

9. Shell Programming and Scripting

Output large volume of data to CSV file

I have a program that output the ownership and permission on each directory and file on the server to a csv file. I am getting error message when I run the program. The program is not outputting to the csv file. Error: the file access permissions do not allow the specified action cannot... (2 Replies)
Discussion started by: dellanicholson
2 Replies

10. Red Hat

No space in volume group. How to create a file system using existing logical volume

Hello Guys, I want to create a file system dedicated for an application installation. But there is no space in volume group to create a new logical volume. There is enough space in other logical volume which is being mounted on /var. I know we can use that logical volume and create a virtual... (2 Replies)
Discussion started by: vamshigvk475
2 Replies
DAHDI_HARDWARE(8)					User Contributed Perl Documentation					 DAHDI_HARDWARE(8)

NAME
dahdi_hardware - Shows Dahdi hardware devices. SYNOPSIS
dahdi_hardware [-v][-x] OPTIONS
-v Verbose ouput - show spans used by each device etc. Currently only implemented for the Xorcom Astribank. -x Show disconnected Astribank unit, if any. DESCRIPTION
Show all dahdi hardware devices. Devices are recognized according to lists of PCI and USB IDs in Dahdi::Hardware::PCI.pm and Dahdi::Hardware::USB.pm . For PCI it is possible to detect by sub-vendor and sub-product ID as well. The first output column is the connector: a bus specific field that shows where this device is. The second field shows which driver should handle the device. a "-" sign marks that the device is not yet handled by this driver. A "+" sign means that the device is handled by the driver. For the Xorcom Astribank (and in the future: for other Dahdi devices) some further information is provided from the driver. Those extra lines always begin with spaces. Example output: Without drivers loaded: usb:001/002 xpp_usb- e4e4:1152 Astribank-multi FPGA-firmware usb:001/003 xpp_usb- e4e4:1152 Astribank-multi FPGA-firmware pci:0000:01:0b.0 wctdm- e159:0001 Wildcard TDM400P REV H With drivers loaded, without -v: usb:001/002 xpp_usb+ e4e4:1152 Astribank-multi FPGA-firmware usb:001/003 xpp_usb+ e4e4:1152 Astribank-multi FPGA-firmware pci:0000:01:0b.0 wctdm+ e159:0001 Wildcard TDM400P REV E/F With drivers loaded, with -v: usb:001/002 xpp_usb+ e4e4:1152 Astribank-multi FPGA-firmware LABEL=[usb:123] CONNECTOR=usb-0000:00:1d.7-1 XBUS-00/XPD-00: FXS Span 2 XBUS-00/XPD-10: FXS Span 3 XBUS-00/XPD-20: FXS Span 4 XBUS-00/XPD-30: FXS Span 5 usb:001/003 xpp_usb+ e4e4:1152 Astribank-multi FPGA-firmware LABEL=[usb:4567] CONNECTOR=usb-0000:00:1d.7-4 XBUS-01/XPD-00: FXS Span 6 XPP-SYNC XBUS-01/XPD-10: FXO Span 7 XBUS-01/XPD-20: FXO Span 8 XBUS-01/XPD-30: FXO Span 9 pci:0000:01:0b.0 wctdm+ e159:0001 Wildcard TDM400P REV E/F perl v5.14.2 2009-04-20 DAHDI_HARDWARE(8)
All times are GMT -4. The time now is 06:35 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy