I have a rather large file with a few million lines looking like this:
In this file the lines can be split into different records with a name (starting with >) and the encoded information/sequence (001010...) associated with the header. Now, I need to add some code to the header according to the following file:
The results should look like this:
The two files (seq.txt, code.txt) are not sorted but the number of records are identical.
I could use sed to change one record header at a time
or maybe write it into a file an execute it
but this might take some time. Does anybody have a faster and maybe more elegant way for me to modify the record headers?
Yes, editing a huge file once as opposed to editing a huge file n times for n lines would certainly be preferable!
This should work efficiently for anywhere up to millions of sequences listed in code.txt :
It works because awk has associative arrays, you can do ARRAY["something"]="ABCD". And NR==FNR means 'do this only for the first file listed'. So it reads the entire list into an associative array, then reads through the huge file hunting for relevant lines, substituting where appropriate, then printing everything.
Just a side note: Many shells (for instance bash and zsh) have associative arrays too. Problem is that the OP did not specify whether he wants to restrict his solution to a particular shell, as the code snippet he wrote would be compliant to several shells.
Yes, editing a huge file once as opposed to editing a huge file n times for n lines would certainly be preferable!
This should work efficiently for anywhere up to millions of sequences listed in code.txt :
It works because awk has associative arrays, you can do ARRAY["something"]="ABCD". And NR==FNR means 'do this only for the first file listed'. So it reads the entire list into an associative array, then reads through the huge file hunting for relevant lines, substituting where appropriate, then printing everything.
---------- Post updated at 08:54 AM ---------- Previous update was at 08:35 AM ----------
Dear Corona,
Thanks for the help and the explanation. I'm am not sure I understand the solution completely.
A[$1]=$0 means I read everything from the first file provided - because I use -F ";" the line in the first file is split up /^>/ && ($1 in A) is this the if statement - if the line starts with a ">" sign and $1 is somewhere in the arry - why $1 ? Is it not file two or is it everything after ";" meant for file two?
Would be great if you would find the time to explain me the awk array a bit more. I really appreciate your help.
Moderator's Comments:
Please use CODE / ICODE tags as required by forum rules!
Last edited by RudiC; 04-14-2017 at 03:58 AM..
Reason: Added ICODE tags.
Guys i need an idea for one logic..in shell scripting am struggling with a logic...So the thing is... i need to search for a word in a huge log file and i need to continue to print few more lines from that line and the consecutive line has to end when it finds the line with date..because i know... (1 Reply)
hi every one. one of my friends has writen this script and send it to me. this script can find files that add-delete-modify and also send an alert by email
i'm not catch all part of it.
can anyone explain me how this work
#!/bin/bash
START="a.txt"
END="b.txt"
DIFF="c.txt"
mv ${START}... (4 Replies)
i have a '|' delimited file having 4 fields.
now i want to sort the data by combination of first three fields without changing order of 4th field.
input file looks like this:
3245|G|kop|45
1329|A|uty|76
9878|K|wer|12
3245|G|kop|15
1329|A|uty|56
9878|K|wer|2
3245|G|kop|105... (4 Replies)
Please help. My file system is 100%, I can't seem to find what is taking so much space. The total hard drive space is 150Gig free but I got nothing now.
I did to this to find the big file but it's taking so much time. Is there any other way?
du -ah / | more
find ./ -size +200M... (3 Replies)
Hi Experts,
I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised... (7 Replies)
Hello All,
I am new to this shell scripting , I wanted to modify the output of my find command such that it does not display the path but only file names , for example I am searching for the files which are modified in the last 24 hours which is
find /usr/monitor/text/ -type f -mtime... (3 Replies)
Hi, all:
I've got two folders, say, "folder1" and "folder2".
Under each, there are thousands of files.
It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command.
However, if I change the above question a... (1 Reply)
Hi ,
i have files coming in my system which are very huge in MB and GBs, all these files are in a single line, there is no newline character.
I need to get only last 700 bytes of these files, of this i am splitting the files by "split -b 700 filename" but this gives all the splitted... (2 Replies)
Hello all !I have two sets of folders that have IP address from two sources.The below perl script I was working with needs some corrections.I am looking for the perl script to identify and count what IP address are found to be duplicated between both files.The format from both files are the same... (4 Replies)
I have a file with data extracted, and need to insert a header with a constant string, say: H|PayerDataExtract
if i use sed, i have to redirect the output to a seperate file like
sed ' sed commands' ExtractDataFile.dat > ExtractDataFileWithHeader.dat
the same is true for awk
and... (10 Replies)