Sponsored Content
Top Forums Shell Programming and Scripting Extract specific content from data and rename its header problem asking Post 302407018 by alister on Wednesday 24th of March 2010 11:11:13 AM
Old 03-24-2010
Quote:
Originally Posted by malcomex999
You can modify alister code like this...
Code:
awk 'FNR==NR {if (/^>/) p=substr($0,2); 
else a[p]=a[p] $0; next} 
{printf(">%s_0.%02u\n%s\n", $1, ++i[$1], substr(a[$1], $2, ($2>=$3?$3:$3-$2+1)))}' f1 f2

Hi, malcomeex999:

That tweak is incorrect, if I understand the modification to f2 correctly. If the second field is greater than the third, then it instead of being treated as the beginning index of the substring, it should be considered the end index (and the interpretation of the third field should be complementarily swapped). The correct solution requires that the second argument to substr() be modified as well, since in the case of $2 > $3, it should be $3 not $2.

By the way, malcomeex999 and rdcwayx, thank you very much for your bit awards. It's appreciated Smilie


Hi, patrick87:

One solution to handle both cases (even if they appear within the same file2):
Code:
awk 'FNR==NR {if (/^>/) p=substr($0,2); else a[p]=a[p] $0; next}
     {if ($2>$3) {t=$2; $2=$3; $3=t}; printf(">%s_0.%02u\n%s\n", $1, ++i[$1], substr(a[$1], $2, $3-$2+1))}' f1 f2

It works identically to my earlier solution except that it tests the second and third fields in f2. If the first index is greater than the second, their values are swapped before the substr() call.

Regards,
Alister
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract specific content from a file

My input file: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF SDGFDGSFDGSGSDGSDGSDGSDGSDG My... (22 Replies)
Discussion started by: patrick87
22 Replies

2. Shell Programming and Scripting

Extract all the content after a specific data

My input: >seq_1 DSASSTRRARRRRTPRTPSLRSRRSDVTCS >seq_3 RMRLRRWRKSCSERS*RRSN >seq_8 RTTGLSERPRLPTTASRSISSRWTR >seq_10 NELPLEKGSLDSISIE >seq_9 PNQGDAREPQAHLPRRQGPRDRPLQAYA+ QVQHRRHDHSRTQH*LCRRRQREDCDRLHR >seq_4 DRGKGQAGCRRPQEGEALVRRCS>seq_6 FA*GLAAQDGEA*SGRG My output: Extract all... (22 Replies)
Discussion started by: patrick87
22 Replies

3. Shell Programming and Scripting

Extract specific data content from a long list of data

My input: Data name: ABC001 Data length: 1000 Detail info Data Direction Start_time End_time Length 1 forward 10 100 90 1 forward 15 200 185 2 reverse 50 500 450 Data name: XFG110 Data length: 100 Detail info Data Direction Start_time End_time Length 1 forward 50 100 50 ... (11 Replies)
Discussion started by: patrick87
11 Replies

4. Shell Programming and Scripting

Remove specific pattern header and its content problem facing

Input file: >TRACK: Position: 1 TYPE: 1 Pos: SVAVPQRHHPGGTVFREPIIIPAIPRLVPGWNKPIIIGRHAFGDQYRATDRVIPGPGKLE LVYTPVNGEPETVKVYDFQGGGIAQTQYNTDESIRGFAHASFQMALLKGLPLYMSTKNTI LKRYDGRFKDIFQEIYESTYQKDFEAKNLWYEHRLIDDMVAQMIKSEGGFVMALKNYDGD >TRACK: Position: 1 TYPE: 2 Pos: FAHASFQMALLKGLPLYMS... (8 Replies)
Discussion started by: patrick87
8 Replies

5. Shell Programming and Scripting

Way to extract detail and its content above specific value problem asking

Input file: >position_10 sample:68711 coords:5453-8666 number:3 type:complete len:344 MSINQYSSDFHYHSLMWQQQQQQQQHQNDVVEEKEALFEKPLTPSDVGKLNRLVIPKQHA ERYFPLAAAAADAVEKGLLLCFEDEEGKPWRFRYSYWNSSQSYVLTKGWSRYVKEKHLDA NRTS* >position_4 sample:68711 coords:553-866 number:4 type:partial len:483... (7 Replies)
Discussion started by: patrick87
7 Replies

6. Shell Programming and Scripting

mailx requirement - email body header in bold and data content in normal text

Dear all- I have a requirement to send an email via email with body content which looks something below- Email body contents -------------------- RequestType: Update DateAcctOpened: 1/5/2010 Note that header information and data content should be normal text.. Please advice on... (5 Replies)
Discussion started by: sureshg_sampat
5 Replies

7. Shell Programming and Scripting

Extract all content that match exactly only specific word

Input: 21 templeta parent 35718 36554 . - . ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set; 21 templeta kids 35718 36554 . - . ID=_52; Parent=parent_cluster_5085.21.11; 21 templeta ... (7 Replies)
Discussion started by: patrick87
7 Replies

8. Shell Programming and Scripting

Help with rename header content based on reference file problem

I got long list of reference file >data_tmp_number_22 >data_tmp_number_12 >data_tmp_number_20 . . Input file: >sample_data_1 Math, 5, USA, tmp SDFEWRWERWERWRWER FSFDSFSDFSDGSDGSD >sample_data_2 Math, 15, UK, tmp FDSFSDFF >sample_data_3 Math, 50, USA, tmp ARQERREQR . . Desired... (7 Replies)
Discussion started by: perl_beginner
7 Replies

9. Shell Programming and Scripting

extract specific string and rename file

Hi all, I am working on a small prog.. i have a file.txt which contains random data... K LINES V4 ADD CODE `COMPANY` ADD CODE `DISTRIBUTOR` SEQ NAME^K LINES V5 SEQ NAME^K LINES V6 ADD `PACK-LDATE` SEQ NAME^K^KCOMMAND END^KHEADINFO... (1 Reply)
Discussion started by: mukeshguliao
1 Replies

10. Shell Programming and Scripting

Help with rename data content

Input file: data21_result1 data23_result1 data43_result1 data43_result2 data43_result3 data3_result1 . . data9_result1 Desired output data1_result1 data2_result1 data3_result1 data3_result2 data3_result3 data4_result1 (3 Replies)
Discussion started by: perl_beginner
3 Replies
Archive::Tar::File(3pm) 				 Perl Programmers Reference Guide				   Archive::Tar::File(3pm)

NAME
Archive::Tar::File - a subclass for in-memory extracted file from Archive::Tar SYNOPSIS
my @items = $tar->get_files; print $_->name, ' ', $_->size, " " for @items; print $object->get_content; $object->replace_content('new content'); $object->rename( 'new/full/path/to/file.c' ); DESCRIPTION
Archive::Tar::Files provides a neat little object layer for in-memory extracted files. It's mostly used internally in Archive::Tar to tidy up the code, but there's no reason users shouldn't use this API as well. Accessors A lot of the methods in this package are accessors to the various fields in the tar header: name The file's name mode The file's mode uid The user id owning the file gid The group id owning the file size File size in bytes mtime Modification time. Adjusted to mac-time on MacOS if required chksum Checksum field for the tar header type File type -- numeric, but comparable to exported constants -- see Archive::Tar's documentation linkname If the file is a symlink, the file it's pointing to magic Tar magic string -- not useful for most users version Tar version string -- not useful for most users uname The user name that owns the file gname The group name that owns the file devmajor Device major number in case of a special file devminor Device minor number in case of a special file prefix Any directory to prefix to the extraction path, if any raw Raw tar header -- not useful for most users Methods Archive::Tar::File->new( file => $path ) Returns a new Archive::Tar::File object from an existing file. Returns undef on failure. Archive::Tar::File->new( data => $path, $data, $opt ) Returns a new Archive::Tar::File object from data. $path defines the file name (which need not exist), $data the file contents, and $opt is a reference to a hash of attributes which may be used to override the default attributes (fields in the tar header), which are described above in the Accessors section. Returns undef on failure. Archive::Tar::File->new( chunk => $chunk ) Returns a new Archive::Tar::File object from a raw 512-byte tar archive chunk. Returns undef on failure. $bool = $file->extract( [ $alternative_name ] ) Extract this object, optionally to an alternative name. See "Archive::Tar->extract_file" for details. Returns true on success and false on failure. $path = $file->full_path Returns the full path from the tar header; this is basically a concatenation of the "prefix" and "name" fields. $bool = $file->validate Done by Archive::Tar internally when reading the tar file: validate the header against the checksum to ensure integer tar file. Returns true on success, false on failure $bool = $file->has_content Returns a boolean to indicate whether the current object has content. Some special files like directories and so on never will have any content. This method is mainly to make sure you don't get warnings for using uninitialized values when looking at an object's content. $content = $file->get_content Returns the current content for the in-memory file $cref = $file->get_content_by_ref Returns the current content for the in-memory file as a scalar reference. Normal users won't need this, but it will save memory if you are dealing with very large data files in your tar archive, since it will pass the contents by reference, rather than make a copy of it first. $bool = $file->replace_content( $content ) Replace the current content of the file with the new content. This only affects the in-memory archive, not the on-disk version until you write it. Returns true on success, false on failure. $bool = $file->rename( $new_name ) Rename the current file to $new_name. Note that you must specify a Unix path for $new_name, since per tar standard, all files in the archive must be Unix paths. Returns true on success and false on failure. $bool = $file->chmod $mode) Change mode of $file to $mode. The mode can be a string or a number which is interpreted as octal whether or not a leading 0 is given. Returns true on success and false on failure. $bool = $file->chown( $user [, $group]) Change owner of $file to $user. If a $group is given that is changed as well. You can also pass a single parameter with a colon separating the use and group as in 'root:wheel'. Returns true on success and false on failure. Convenience methods To quickly check the type of a "Archive::Tar::File" object, you can use the following methods: $file->is_file Returns true if the file is of type "file" $file->is_dir Returns true if the file is of type "dir" $file->is_hardlink Returns true if the file is of type "hardlink" $file->is_symlink Returns true if the file is of type "symlink" $file->is_chardev Returns true if the file is of type "chardev" $file->is_blockdev Returns true if the file is of type "blockdev" $file->is_fifo Returns true if the file is of type "fifo" $file->is_socket Returns true if the file is of type "socket" $file->is_longlink Returns true if the file is of type "LongLink". Should not happen after a successful "read". $file->is_label Returns true if the file is of type "Label". Should not happen after a successful "read". $file->is_unknown Returns true if the file type is "unknown" perl v5.16.2 2012-10-25 Archive::Tar::File(3pm)
All times are GMT -4. The time now is 05:47 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy