08-13-2011
Selecting Specific Columns and Insert the delimiter TAB
Hi,
I am writing a Perl Script for the below :
I have a data file that consists of the header information which is 231 Lines and the footer information as 4 lines. The total number of line including the header and footer 1.2 Million with Pipe Delimited file.
For example:
Header Information:
Quote:
START-OF-FILE
FILENAME=fixedincome_bo_euro.out
DATA=bo
REGION=euro
TYPE=out
PROGRAMNAME=getdata
DATEFORMAT=yyyymmdd
... so on 231 Lines
Footer Information:
Quote:
END-OF-DATA
DATARECORDS=1221264
TIMEFINISHED=Fri Aug 12 18:57:09 BST 2011
END-OF-FILE
Data looks like:
Each line has around ~210 columns and is Pipe delimited.
Quote:
TT3069982 Corp|0|198|FSPIN|4.000000| | |FINE SPINNERS|FINE SPIN-CALLED|INDUSTRIAL|Corp|2|FIXED|PERP/CALL|PERPETL PAY,EX-DIV|3|DOMESTIC|EN|GBP|MORTGAGE BACKED|2000000.00|.00|1.0000|1.0000|1.00| |NOT LISTED|100.00000| | |N.A.|N.A.| |100.000000| | | | | | | | | | | | | | | | | | | | | |234953|500000|TT3069982| | | | | | | | | |N.A.| | | | | | | | | | | | |Y|N|N| | | |GB| |Basic Materials|Chemicals|Chemicals-Fibers|N.A.|GB|FSPIN 4 03/29/49|N| |DOMESTIC| |N.A.| | |N| |N|COTT3069982|Fine Spinners|GBP|GBP|N|N|Y|1|N|N|GBP|N|N|Y|19920228|FINE SPINNERS|Anytime| |N.A.| | |N|N|EN|EN|Does Not Apply|20490329|N|42| |Y|N|100.000000|N|20110820|.000000000| |N| | | | |N.A.|N.A.|N.A.|N.A.|N.A.| | | | | | |N|N|N|N| |Grandfathered| |2| | |N.A.|N| | |N| | | | |N| | |20490329| | |N|N|N| | |N|3| | | |N.A.|2| |41|CALENDAR| |N|N|BBG00035Y4Y1|
The outfile should contain the lines with only specific Columns and should be TAB delimited.
Specific Columns:
Quote:
3 4 5-7 10 11 12 13 15 16-19 20-24 25-26 27 28-32 33 36 37 40 55-58 59 60
61 62 63-66 68 69-72 73 74-75 76 77 78-79 80-86 87 88-94 95 96-99 100 101-103 105-107 109-110 112-123 125-128 130-131 133-135 137 111 124 132 136 187 Only.
So I have started writing the Perl script:
Quote:
#!/usr/bin/perl
$file='fileA';
open(F,$file)|| die ("could not open file $file: $!");
@array = <F>;
close F;
open(OUT,'>','outfile');
print OUT @array[231..$#array-4];
close OUT;
I am using array spice to eliminate the Header and footer information..Please correct me if I am wrong.
Now, Once I load the file into an array, how do I select the above selected columns and then insert the delimiter as TAB in Perl.
Would that be easier if I use hashes or array ?
Could someone Please help me out in this. Really appreciate your thoughts.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I have a 30 column tab delimited record file. I need to extract the first 10column. The following command to cut was not working
cut -f 1-10 -d "\t" filename.
Could any one keep on this .
Thanks in Advance (4 Replies)
Discussion started by: vinod.thayil
4 Replies
2. Shell Programming and Scripting
Hi everybody,
This time I am having one issue in perl.
I have to create comma separated file using the following type of information. The problem is the columns do not have any specific delimiter. So while using split I am getting different value. Some where it is space(S) and some where it is... (9 Replies)
Discussion started by: Amiya Rath
9 Replies
3. Shell Programming and Scripting
Hi,
I need to append the data in to a file by using tab delimiter.
eg:
echo "Data1" >> filename.txt
echo "\t" >> filename.txt (its not working)
echo "Data2" >> filename.txt.
the result sould be like this.
Data1 Data2 (6 Replies)
Discussion started by: Sharmila_P
6 Replies
4. UNIX for Advanced & Expert Users
hi every one
plz help me
i want to search for a line contains tabspace
This is a line The should be changed
see the above line is seperated with tab space i want to replace that tab space in to # as
This is a line#The should be changed
i have tried with... (4 Replies)
Discussion started by: kkraja
4 Replies
5. UNIX for Dummies Questions & Answers
Hello,
Is there a direct command to check if the delimiter in your file is a tab or a space? And how can they be converted from one to another.
Thanks,
G (4 Replies)
Discussion started by: Gussifinknottle
4 Replies
6. Shell Programming and Scripting
Hello experts,
I am new to this group and to 'SED' and 'AWK'. I have data (text file) with 5 columns (C_1-5) and 100s of lines (only 10 lines are shown below as an example). I have to find or select only the id numbers (C-1) of specific lines with '90' in the same line (of C_3) AND with '20' in... (6 Replies)
Discussion started by: kamskamu
6 Replies
7. UNIX for Dummies Questions & Answers
How can i make a tab delimiter file to a comma delimiter??? (13 Replies)
Discussion started by: saggiboy10
13 Replies
8. Shell Programming and Scripting
Hi,
I have two files like:
file1
chr1 40
chr1 50
chr2 10
chr2 60
file2
chr1 30
chr1 50
chr2 15
chr2 20
and want to get the difference of column 2 when column 1 is the same in both files. (4 Replies)
Discussion started by: linseyr
4 Replies
9. Shell Programming and Scripting
Hello,
I have some problem in inserting the space for the pairs of columns.
I have the input file :
I used this code below in replacing it using space in specific column (replace space in each two columns)
sed -e "s/,/ /2" -e "s/,/ /3" inputfile
Output showed :
However, I have many... (3 Replies)
Discussion started by: awil
3 Replies
10. Shell Programming and Scripting
Hi all ,
I have a file having 12 columns tab delimited .
I need to read this file and remove the column 3 and column 4 and insert a word in column 3 as "AVIALABLE "
Is there a way to do this . I am trying like below
Thanks
DJ
cat $FILENAME|awk -F"\t" '{ print $1 "\t... (3 Replies)
Discussion started by: Hypesslearner
3 Replies
LEARN ABOUT ULTRIX
uuencode
uuencode(5) File Formats Manual uuencode(5)
Name
uuencode - format of an encoded uuencode file
Description
Files output by consist of a header line, followed by a number of body lines, and a trailer line. The command ignores any lines preceding
the header or following the trailer. Lines preceding a header must not, of course, look like a header.
The header line is distinguished by having the first six characters by the word ``begin'', followed by a space. The next item on the line
is a mode (in octal) and a string which names the remote file. A space separates the three items in the header line.
The body consists of a number of lines, each at most 62 characters long including the trailing new line. These consist of a character
count, followed by encoded characters, followed by a new line. The character count is a single printing character and represents an inte-
ger, the number of bytes the rest of the line represents. Such integers are always in the range from 0 to 63 and can be determined by sub-
tracting the character space (octal 40) from the character.
Groups of 3 bytes are stored in 4 characters, with 6 bits per character. All are offset by a space to make the characters print. The last
line may be shorter than the normal 45 bytes. If the size is not a multiple of 3, this fact can be determined by the value of the count on
the last line. Extra dummy characters are included to make the character count a multiple of 4. The body is terminated by a line with a
count of zero. This line consists of one ASCII space.
The trailer line consists of "end" on a line by itself.
See Also
mail(1), uucp(1c), uudecode(1c), uuencode(1c), uusend(1c)
uuencode(5)