Sponsored Content
Top Forums Shell Programming and Scripting How to extract data from a huge file? Post 302159541 by srsahu75 on Friday 18th of January 2008 01:21:01 AM
Old 01-18-2008
How to extract data from a huge file?

Hi,
I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows:

1. Needs to create folders as the strings starts with "item_*" from the input file
2. Create a file "contents" in each folders having "license.txt(tab \t)bundle:LICENSE" as string in it
3. Create a file "dublin_core.xml" in their respective folder "item_*" extracting the text from the input file under its "item_*" string. The would be extracted text starts with the string <dublin_core schema="dc"> and ends with </dublin_core>

Following are the sample records in the file:

item_3908
<dublin_core schema="dc">
<dcvalue element="contributor" qualifier="author">Fernandes, A.A.</dcvalue>
<dcvalue element="contributor" qualifier="author">Sarma, Y.V.B.</dcvalue>
<dcvalue element="title" qualifier="none">Directional spectrum of ocean waves</dcvalue>
<dcvalue element="date" qualifier="issued">2000</dcvalue>
<dcvalue element="publisher" qualifier="none">GET PUB</dcvalue>
<dcvalue element="identifier" qualifier="citation">Ocean Eng., Vol.27; 345-363p.</dcvalue>
</dublin_core>
/eprints/Ocean_Eng_27_345.pdf
item_3911
<dublin_core schema="dc">
<dcvalue element="contributor" qualifier="author">Phatarpekar, P.V.</dcvalue>
<dcvalue element="title" qualifier="none">A comparative study on growth performance</dcvalue>
<dcvalue element="identifier" qualifier="citation">Aquaculture, Vol.181; 141-155p.</dcvalue>
<dcvalue element="type" qualifier="none">Journal Article</dcvalue>
<dcvalue element="language" qualifier="iso">en</dcvalue>
<dcvalue element="subject" qualifier="none">polyculture</dcvalue>
</dublin_core>
/eprints/Aquaculture_181_141.pdf
item_3921
<dublin_core schema="dc">
<dcvalue element="contributor" qualifier="author">Rao, B.R.</dcvalue>
<dcvalue element="contributor" qualifier="author">Veerayya, M.</dcvalue>
<dcvalue element="title" qualifier="none">Influence of marginal highs on the accumulation</dcvalue>
<dcvalue element="description" qualifier="abstract">Twenty five surficial sediment samples were</dcvalue>
</dublin_core>
/eprints/Deep-Sea_Res_(II)_47_303.pdf


Thanks & Regards
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

search and grab data from a huge file

folks, In my working directory, there a multiple large files which only contain one line in the file. The line is too long to use "grep", so any help? For example, if I want to find if these files contain a string like "93849", what command I should use? Also, there is oder_id number... (1 Reply)
Discussion started by: ting123
1 Replies

2. Shell Programming and Scripting

How to extract a piece of information from a huge file

Hello All, I need some assistance to extract a piece of information from a huge file. The file is like this one : database information ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc os information cccccccccccccccccc cccccccccccccccccc... (2 Replies)
Discussion started by: Marcor
2 Replies

3. Shell Programming and Scripting

insert a header in a huge data file without using an intermediate file

I have a file with data extracted, and need to insert a header with a constant string, say: H|PayerDataExtract if i use sed, i have to redirect the output to a seperate file like sed ' sed commands' ExtractDataFile.dat > ExtractDataFileWithHeader.dat the same is true for awk and... (10 Replies)
Discussion started by: deepaktanna
10 Replies

4. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is... (3 Replies)
Discussion started by: cliffyiu
3 Replies

5. Shell Programming and Scripting

Three Difference File Huge Data Comparison Problem.

I got three different file: Part of File 1 ARTPHDFGAA . . Part of File 2 ARTGHHYESA . . Part of File 3 ARTPOLYWEA . . (4 Replies)
Discussion started by: patrick87
4 Replies

6. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment... (7 Replies)
Discussion started by: lv99
7 Replies

7. Shell Programming and Scripting

Extract header data from one file and combine it with data from another file

Hi, Great minds, I have some files, in fact header files, of CTD profiler, I tried a lot C programming, could not get output as I was expected, because my programming skills are very poor, finally, joined unix forum with the hope that, I may get what I want, from you people, Here I have attached... (17 Replies)
Discussion started by: nex_asp
17 Replies

8. Shell Programming and Scripting

Extract few content from a huge list of files

I have a huge list of files (about 300,000) which have a pattern like this. .I 1 .U 87049087 .S Am J Emerg .M Allied Health Personnel/*; Electric Countershock/*; .T Refibrillation managed by EMT-Ds: .P ARTICLE. .W Some patients converted from ventricular fibrillation to organized... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

9. UNIX for Advanced & Expert Users

Need Optimization shell/awk script to aggreagte (sum) for all the columns of Huge data file

Optimization shell/awk script to aggregate (sum) for all the columns of Huge data file File delimiter "|" Need to have Sum of all columns, with column number : aggregation (summation) for each column File not having the header Like below - Column 1 "Total Column 2 : "Total ... ...... (2 Replies)
Discussion started by: kartikirans
2 Replies

10. UNIX for Advanced & Expert Users

File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this

I have 2 large file (.dat) around 70 g, 12 columns but the data not sorted in both the files.. need your inputs in giving the best optimized method/command to achieve this and redirect the not macthing lines to the thrid file ( diff.dat) File 1 - 15 columns File 2 - 15 columns Data is... (9 Replies)
Discussion started by: kartikirans
9 Replies
folder(1)						      General Commands Manual							 folder(1)

NAME
folder - set folder or display current folder name (only available within the message handling system, mh) SYNOPSIS
folder [+folder] [msg] [options] OPTIONS
Displays information on all the folders in your Mail directory. The folders are listed alphabetically, with a line of information given for each folder. This is identical to the display produced by the folders command. See folders(1). Lists only the name of the current folder, with no additional information. This is faster because the folders need not be read. Displays a header produced by the system, in addition to the information about the current file. This header is identical to the one that appears at the top of the listing produced by folder -all or by folders. The header can be suppressed by using the -noheader option. Prints a list of the valid options to this com- mand. Lists the contents of the folder-stack. No +folder argument is allowed with this option. The contents of the folder-stack are listed automatically when the -pop or -push option is used. This corresponds to the dirs operation in the C-shell. Re-numbers messages in the folder. Messages are re-numbered sequentially, and any gaps in the numbering are removed. The default operation is -nopack, which does not change the numbering in the folder. Discards the top of the folder-stack, after setting the current folder to that value. No +folder argu- ment is allowed with this option. This corresponds to the popd operation in the C-shell; see csh(1). The -push and -pop options are mutu- ally exclusive: the last occurrence of either one overrides any previous occurrence of the other. Pushes the current folder onto the folder-stack, and makes the +folder argument into the current folder. If +folder is not given, the current folder and the top of the folder-stack are exchanged. This corresponds to the pushd operation in the C-shell; see csh(1). The -push switch and the -pop switch are mutually exclusive: the last occurrence of either one overrides any previous occurrence of the other. Lists each folder recursively. Information on the current folder is displayed, followed by information on any sub-folders which it contains. Displays only the total num- ber of messages and folders in your Mail directory. This option does not print any information about the current folder. It can be sup- pressed using the -nototal option. The defaults for folder are: +folder defaults to the current folder msg defaults to none -nofast -noheader -nototal -nopack -norecurse DESCRIPTION
The folder command lets you set the current folder, or display information about it. It can also be used to manage the folder stack. If you use the folder command without a +folder argument, information about the current folder is displayed on the screen. If you use folder with the +folder argument, the named folder is set to be the current folder. Information about the named folder is also displayed on the screen. If you use folder with the msg argument, it will set the specified message to be current. Information on the current folder is also dis- played. You can use both the +folder and msg arguments together in one command. If you specify a +folder that does not exist, you are asked whether you want to create it. This is a good way to create an empty folder for later use. The display is identical whether you set the folder or display the contents of the current folder. The following example shows the type of display that is produced. The display lists the current folder, the number of messages in it, the range of the messages (low-high), and the current message within the folder. It also flags extra files if they exist. inbox+ has 16 messages ( 3- 22); cur= 15. RESTRICTIONS
Do not create folder names that are made up of only digits. PROFILE COMPONENTS
Current-Folder: To find the default current folder Folder-Protect: To set mode when creating a new folder Folder-Stack: To determine the folder stack lsproc: Program to list the contents of a folder EXAMPLES
The following example shows how folder can be used to change the current folder to +test, and display information on that folder. The plus sign (+) next to test indicates that it is now the current folder. % folder +test test+ has 2 messages ( 1- 2); cur= 2. The next example shows the display produced by using the -all option to folder: Folder # of messages ( range ); cur msg (other files) V2.3 has 3 messages ( 1- 3). adrian has 20 messages ( 1- 20); cur= 2. brian has 16 messages ( 1- 16). chris has 12 messages ( 1- 12). copylog has 242 messages ( 1- 242); cur= 225. inbox+ has 73 messages ( 1- 127); cur= 127. int has 4 messages ( 1- 4); cur= 2 (others). jack has 17 messages ( 1- 17); cur= 17. TOTAL= 387 messages in 8 folders. This display is identical to that produced by the folders command. See folders(1) for an explanation of this display. The next example shows how to use folder to create an empty folder: % folder +test Create folder "$HOME/Mail/test"? y test+ has no messages. You can also use folder to create an empty sub-folder within an existing folder. The following example shows how you can create a sub-folder in the folder +test: % folder +test/testtwo Create folder "$HOME/Mail/test/testwo"? y test/testtwo+ has no messages. See refile(1) for more details of sub-folders. FILES
The user profile. SEE ALSO
csh(1), refile(1), mhpath(1) folder(1)
All times are GMT -4. The time now is 09:06 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy