Adding lines to files based on their names


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Adding lines to files based on their names
# 1  
Old 06-24-2010
Adding lines to files based on their names

I have several files on a folder something like this:
Quote:
1a-Pat1.fas
1a-Pat32.fas
1b-Pat16.fas
6-Pat76.fas
4-Pat243.fas
3-Pat101.fas
5-Pat33.fas
6-Pat93.fas
2-Pat5.fas
If the file starts with 1a, I would like to add the following at the very beggining of the file
Quote:
>Reference1a
AGCATGACGATCAGTGTGCGGGTGA
If the file starts with 1b then I should add the following
Quote:
>Reference1b
AGCATGTGTGCGGGTGA
For files starting with 6
Quote:
>Reference6
AGCATGTGTGCGGGTGATTTTTT
So on and so forth. Thus, at the end each file will have a reference sequence at the very top 'matching' the beggining of the file name.
Thanks in advance.
# 2  
Old 06-24-2010
wrong solutions. deleted.

Last edited by rdcwayx; 06-25-2010 at 02:44 AM..
# 3  
Old 06-24-2010
Assuming that you have files which contain the data that needs to be added to each file named after the reference. For example, file "1a.header" contains the header for reference id "1a".
Code:
function add_reference
{
  typeset file=$1
  typeset ref_id=${file%%-*}
  typeset temp=${file}.tmp
  cat ${ref_id}.header $file > $temp
  mv $temp $file
}

for file in *.fas
do
  add_reference $file
done

If you want to keep all the reference strings within your script (instead of external files), you can use associative arrays. Since, ksh does not support associative arrays, you will have to use either ksh93 or awk.
# 4  
Old 06-24-2010
another way, you can build init files by name as 1a, 1b, 1 with related contents.

such as:

Code:
$ cat 1a
>Reference1a
AGCATGACGATCAGTGTGCGGGTGA

then use below scripts.
Code:
for file in `ls *.fas`
do
  str=`echo $file|cut -d \- -f1`
  cat $str $file >> $file_new
done


Last edited by rdcwayx; 06-25-2010 at 02:43 AM..
# 5  
Old 06-25-2010
I rather have all the References within my script (I really do not like the idea of having external files)
Code:
for file in `ls 1a*.fas`
do 
  echo ">Reference1a
AGCATGACGATCAGTGTGCGGGTGA " >> $file
done

I am not getting the expected result. ALl the information in the file is being replaced by the reference sequence.
Quote:
If you want to keep all the reference strings within your script (instead of external files), you can use associative arrays. Since, ksh does not support associative arrays, you will have to use either ksh93 or awk.
I have been trying with awk, I just cannot get it to do what I want.
# 6  
Old 06-25-2010
Quote:
Originally Posted by Xterra
I rather have all the References within my script (I really do not like the idea of having external files)
Code:
for file in `ls 1a*.fas`
do 
  echo ">Reference1a
AGCATGACGATCAGTGTGCGGGTGA " >> $file
done

I am not getting the expected result. ALl the information in the file is being replaced by the reference sequence.

I have been trying with awk, I just cannot get it to do what I want.
I am not big with awk, but the original code I posted can be modified to use ksh93 as follows:

Code:
#!/bin/ksh93

typeset -A reference_headers

##
## Add all the valid references in this associative array
##
reference_headers=(
[1a]=">Reference1a
AGCATGACGATCAGTGTGCGGGTGA"
[1b]=">Reference1b
AGCATGTGTGCGGGTGA"
);

function add_reference
{
  typeset file=$1
  typeset ref_id=${file%%-*}
  typeset temp=${file}.tmp
  (echo "${reference_headers[${ref_id}]}"; cat $file) > $temp
  mv $temp $file
}

for file in *.fas
do
  add_reference $file
done

Note that, on my Mac OS X, the default ksh shell itself is ksh93, but on lots of Unix systems there are separate executables for ksh and ksh93. If your system has only ksh, if the following command gives an output similar to what is shown, its ksh93, else not:
Code:
$ echo ${.sh.version}
Version M 1993-12-28 s+

# 7  
Old 06-25-2010
a_programmer

I will try that. Hopefully someone will post a code using awk to accomplish the same task.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Concatenate files based on names

Dear all, I have a list of files and I woulk like to concatenate some of them based on their name. Basically my files are names like that: file1_abcd_other_useless_letters_1_C1.txt file1_abcd_other_useless_letters_1_C2.txt file1_xywz_other_useless_letters_1_C1.txt... (4 Replies)
Discussion started by: giuliangiuseppe
4 Replies

2. Shell Programming and Scripting

Two files, remove lines from second based on lines in first

I have two files, a keepout.txt and a database.csv. They're unsorted, but could be sorted. keepout: user1 buser3 anuser19 notheruser27 database: user1,2343,"information about",field,blah,34 user2,4231,"mo info",etc,stuff,43 notheruser27,4344,"hiya",thing,more thing,423... (4 Replies)
Discussion started by: esoffron
4 Replies

3. Shell Programming and Scripting

Join lines from two files based on match

I have two files. File1 >gi|11320906|gb|AF197889.1|_Buchnera_aphidicola ATGAAATTTAAGATAAAAAATAGTATTTT >gi|11320898|gb|AF197885.1|_Buchnera_aphidicola ATGAAATTTAATATAAACAATAAAA >gi|11320894|gb|AF197883.1|_Buchnera_aphidicola ATGAAATTTAATATAAACAATAAAATTTTT File2 AF197885 Uroleucon aeneum... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

4. Shell Programming and Scripting

Deleting lines based on a condition for a group of files

hi i have a set of similar files. i want to delete lines until certain pattern appears in those files. for a single file the following command can be used but i want to do it for all the files at a time since the number is in thousands. awk '/PATTERN/{i++}i' file (6 Replies)
Discussion started by: anurupa777
6 Replies

5. Shell Programming and Scripting

Generate files from one file based on lines

Hi Friends, I have a file1 file1.txt 1ABC 13478 aqjerh 473 343 2hej 478 5775 24578 23892 3fhd fg 847 brjkb f99345 487 4eh ehjk 84 47589 8947 234 5784 487 738 52895 8975 6 57489 eghe9 4575 859479 7fnbd 4y5 4iuy 458 h irh 8fjdg 74 7845 8475 5789 94yr 48yr 4hr erhj reh... (3 Replies)
Discussion started by: i150371485
3 Replies

6. Shell Programming and Scripting

combine lines from two files based on an if statement

I'm rather new to programming, and am attempting to combine lines from 2 files in a way that is way beyond my expertise - any help would be appreciated! I need to take a file (file1) and add columns to it from another file (file2). However, a line from file2 should only be added to a given line... (3 Replies)
Discussion started by: Cheri
3 Replies

7. Shell Programming and Scripting

Adding lines to files based on file extension

I have posted this before but did not get many replies, so here it goes again. I have several files name like this If the file extension is 1a, I woould like to add at the beggining of the file the following sequence If the file extension is 1b, thn the entry that should be added is the next... (2 Replies)
Discussion started by: Xterra
2 Replies

8. UNIX for Dummies Questions & Answers

sort lines in different files based on the starting letter

Hi ,, i have the below file... D 2342135 B 214236 C argjlksd V lskjrghaklsr C slkrgj B sdg4tsd E aslkgjlkasg i want to sort the lines into different files based on the starting letter of the line. so that i have different files for lines starting with a letter. thanks (1 Reply)
Discussion started by: jathin12
1 Replies

9. Shell Programming and Scripting

how to combine 2 lines in same files based on any text

hi, I want to combine two lines in same file. If the line ends with '&' it should belongs to previous line only Here i am writing example. Ex1: line 1 : return abcdefgh& line 2 : ijklmnopqr& line 3 : stuvw& line 4 : xyz output should be line 1: return abcdefghijklmnopqrstuvwxyz ... (11 Replies)
Discussion started by: spc432
11 Replies

10. Shell Programming and Scripting

Adding 3 Lines to Multiple c and h files with perl

Hello, i need some help with a perl script. i need to add the lines: #ifdef LOGALLOC #include "logalloc.h" #endif // LOGALLOC To all c and h files in a project with subdirectories. Logalloc is a tool to log all *alloc and free's in a text file, it defines the *alloc funtions new.... (2 Replies)
Discussion started by: Lazzar
2 Replies
Login or Register to Ask a Question