Adding filename and line number from multiple files to final file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Adding filename and line number from multiple files to final file
# 8  
Old 04-17-2013
Quote:
Originally Posted by bioinfo
Thanks Smilie
This script is awesome but somewhat tough for me as I am a beginner. Is there any possibililty to add something easy to my previous code (below) to do the same thing.
Code:
cat *.txt | awk '{print $2, $4}' | sed "/#ainst\|#Time/d" > out.txt
or
cat *.txt | awk '{NR >=3 && NR <= 1002 {print $2, $4}' > out.txt

Thanks
Besides being unneeded, if you cat the files instead of letting awk open them, awk can't recover the filenames. The awk command can easily perform arithmetic calculations (such as subtracting the number of comments found); sed can't.

You said you wanted to print a portion of the file's name on each output line. You can't do that with either of the scripts above. In your scripts, neither awk nor sed have access to the filenames of the input files they are processing.

You said you wanted to delete the 1st two lines (which start with a #) from each file. Your 2nd script can't be made to do that if you keep the cat. Your 2nd script throws away the 1st two lines of the 1st file but keeps all others. It can easily throw away all lines starting with # (like my awk script did), but since you are only giving awk one input file, it can't throw away the 1st two lines of any file except the first file it is given if it is using line numbers instead of matching lines that start with a #.

You said you wanted to print the line number (not counting the comment lines) for each line in your input files. You can easily do that by using the awk script I suggested; you can't do it with either of your pipelines without making the awk portion of your pipeline look a lot more like what I suggested before.

What is it about the awk script I provided that is too tough to understand?
This User Gave Thanks to Don Cragun For This Post:
# 9  
Old 04-17-2013
Thanks a lot for letting me know the concepts in detail. Smilie

I did not understand the following things:
Code:
FNR==1{ # This is the first line in a new file...

fn = substr(FILENAME, 5, 3) # Save 3 characters from this filename

printf("%s\t%d\t%s\t%s\n", fn, FNR - n, $2, $4)

I have a general question. In which case I should use cut/paste/cat commands or grep or sed or awk. I am really confused. Smilie
I am getting different answers while googling.

Thanks again.
# 10  
Old 04-17-2013
Quote:
Originally Posted by bioinfo
Thanks a lot for letting me know the concepts in detail. Smilie

I did not understand the following things:
Code:
FNR==1{ # This is the first line in a new file...

The awk utility maintains several variables as it processes a line of text from a file. As you already know, NR is the number of records that have been read from all of the input files. FNR is the number of records that have been read from the current file. When FNR is equal to 1, the condition portion of this awk statement is true and the action portion of the statement will be executed.
Quote:
Originally Posted by bioinfo
Code:
fn = substr(FILENAME, 5, 3) # Save 3 characters from this filename

FILENAME is another variable maintained by the awk utility. It contains the name of the file that is being processed. You said your filenames were:
Code:
file001.txt
file002.txt
00000000011 character
12345678901 number within filename
file003.txt
...
file020.txt

The substr(string, start, count) function in awk returns count characters starting at character number start from string. For example when FILENAME is file001.txt, substr() will return the characters in red and store them in the variable fn (i.e., fn will contain the portion of the filename you want to print at the start of each line printed from this input file.
Quote:
Originally Posted by bioinfo
Code:
printf("%s\t%d\t%s\t%s\n", fn, FNR - n, $2, $4)

The awk printf(format, argument...) function is VERY similar to the C Language printf() function and the printf utility. In this case the function call:
Code:
printf("%s\t%d\t%s\t%s\n", fn, FNR - n, $2, $4)

prints the saved portion of the filename as a character string, a tab character, the current line number in the current file minus the number of lines in the current file starting with # as a decimal numeric string, a tab character, the 2nd field from the current line as a character string, a tab character, and the 4th field from the current line as a character string followed by a newline character.
Quote:
Originally Posted by bioinfo
I have a general question. In which case I should use cut/paste/cat commands or grep or sed or awk. I am really confused. Smilie
I am getting different answers while googling.

Thanks again.
You should use cut and paste when cut and paste do what you need to do and they do it more simply or more efficiently than could be done with your shell's built-in utilities AND you don't need more complex processing (such as that provided by awk or sed) that needs to be used to get the job done.

You should use cat when you need to concatenate two or more files into a single output file, when you need to feed the contents of one or more files into a utility that doesn't accept pathname operands, or when you have a version of cat that provides a non-standard extension that performs some text manipulation as it copies files that you need to perform.

You should NEVER use:
Code:
cat *.txt|awk 'awk program'

instead of:
Code:
awk 'awk program' *.txt

Creating an additional process like this takes more system resources to run your command, makes it run slower, and keeps awk from knowing how many files are being processed and what the names of the files are.

Many of the original UNIX utilities were designed to perform a transformation data read from standard input and write the transformed data to standard output. (These utilities can be called filters.) The idea was that filters could combined in a pipeline to perform much more complex tasks without making each utility more complex than needed. (This is an example of your basic KISS [Keep It Simple, Stupid] principle.) Unfortunately, many of today's utilities on many systems have forgotten the KISS principle.

Even with the original UNIX utilities, there were frequently many different ways to get a job done. Choosing which utilities to use depends on what you are trying to do, your ability to recognize the alternatives available, your ability to use the alternative tools available, and your knowledge of how utilities have evolved on various systems over the years so you know what will work portably on all of the systems you want to use and which code might have to be tweaked if you want to move your script to a different system.

Despite the fact that many of us have degrees in computer science or computer engineering (or both), there is a lot of art (as well as science and engineering) in programming.
This User Gave Thanks to Don Cragun For This Post:
# 11  
Old 04-17-2013
Quote:
In which case I should use cut/paste/cat commands or grep or sed or awk. I am really confused.
There is no hard and fast rule. cut and paste go together, and are very useful, along with join. cat is not needed that much. grep finds lines. sed makes quick arbitrary changes. awk works well with fields and can make programs. bash ties it all together, and can make programs. The main thing I would recommend is keep the code easy to read and maintain, even for other viewers. Emphasize readability over performance. Avoid trying to always do everything with awk, or always do everything with perl. It sounds like you already know it's better to learn a variety of commands, such as the key ones you mentioned, and a few others such as uniq, head, tail, and sort, and use the commands within the context of shell scripts.
This User Gave Thanks to hanson44 For This Post:
# 12  
Old 04-18-2013
Thanks a lot Don Cragon for such an extensive explanation and hanson44. Smilie
Does the amount of space between the lines matter or we can write awk program in one line too? Is it for proper readability only?

Thanks.

---------- Post updated at 10:54 AM ---------- Previous update was at 10:36 AM ----------

Hurray!
I got my output. Smilie Smilie
Thanks
# 13  
Old 04-18-2013
Quote:
Originally Posted by bioinfo
Thanks a lot Don Cragon for such an extensive explanation and hanson44. Smilie
Does the amount of space between the lines matter or we can write awk program in one line too? Is it for proper readability only?

Thanks.

---------- Post updated at 10:54 AM ---------- Previous update was at 10:36 AM ----------

Hurray!
I got my output. Smilie Smilie
Thanks
If is logically possible to write any awk script as a single line, if you're willing to type it into your shell. If the awk program is in a shell file to be executed, you'll have to restrict the length of each line in your script to the limits supported by your editor. You can also throw away all of the comments and change all of the variable names to single characters to make the script shorter.

I choose to write programs in a way that is easy for me to read and understand rather than to try to artificially produce 1-liners. If you ask me about an awk script I submitted here a month ago, I don't want to deal with the obfuscation caused by collapsing an easily read script into a single line.

If you take a script I supplied, modify it slightly to add a new feature, collapse it to a single line, and then ask me to help you debug your new feature; I will definitely be slower to respond and it will be much more likely that I won't respond at all. Smilie
This User Gave Thanks to Don Cragun For This Post:
# 14  
Old 04-18-2013
Ok thanks. Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Insert the line number from text file to filename output

Hi everyone :) I have a file "words.txt" containing hundreds of lines of text. Each line contains a slogan. Using the code below i am able to generate an image with the slogan text from each line. The image filename is saved matching the last word on each line. Example: Line 1: We do... (2 Replies)
Discussion started by: martinsmith
2 Replies

2. Shell Programming and Scripting

Adding user name to file, and then displaying new line number

Hi all - I'm completely stumped by a script I'm working on... The short version is I have a file called 'lookup' and in it are hundreds of names (first and last). I have a script that basically allows the user to enter a name, and what I need to have happen is something like this: Record... (8 Replies)
Discussion started by: sabster
8 Replies

3. Shell Programming and Scripting

adding line number to *end* of records in file

Given a file like this: abc def ghi I need to get to somestandardtext abc1 morestandardtext somestandardtext def2 morestandardtext somestandardtext ghi3 morestandardtext Notice that in addition to the standard text there is the line number added in as well. What I conceived is... (4 Replies)
Discussion started by: edstevens
4 Replies

4. Shell Programming and Scripting

editing line in text file adding number to value in file

I have a text file that has data like: Data "12345#22" Fred ID 12345 Age 45 Wilma Dino Data "123#22" Tarzan ID 123 Age 33 Jane I need to figure out a way of adding 1,000,000 to the specific lines (always same format) in the file, so it becomes: Data "1012345#22" Fred ID... (16 Replies)
Discussion started by: say170
16 Replies

5. Shell Programming and Scripting

insert filename into each line of multiple files

I need to insert <filename + comma> into each line of multiple files. Any idea how to script that? Regards, Manu (5 Replies)
Discussion started by: linux.yahoo
5 Replies

6. Shell Programming and Scripting

Adding filename to each line of the file

Hi, I am a relative new bee in scripting. I need to develop a script such that the code would iterate through each file in a source directory and append every line of the file with '|' and the corresponding file filename. eg INPUT file IF927_1.dat - H|abc... (4 Replies)
Discussion started by: scripting_newbe
4 Replies

7. Shell Programming and Scripting

Adding text in final line

Dear Friends, I have a flat file where last line of it has word D$mhtt I want to add a space and back slash after it. Also wanna add -S "J" in the last line. Following example will make it clear. I have this in the last line of file D$mhtt I want D$mhtt \ -S "J" Please... (5 Replies)
Discussion started by: anushree.a
5 Replies

8. Shell Programming and Scripting

Adding a columnfrom a specifit line number to a specific line number

Hi, I have a huge file & I want to add a specific text in column. But I want to add this text from a specific line number to a specific line number & another text in to another range of line numbers. To be more specific: lets say my file has 1000 lines & 4 Columns. I want to add text "Hello"... (2 Replies)
Discussion started by: Ezy
2 Replies

9. Shell Programming and Scripting

Adding multiple line at the end of the file

I have 2 files which contains the following lines file1.txt line4 line5 line6 file2.txt line1 line2 line3 When i execute a script , I want my file2.txt will looks like this: line1 line2 line3 line4 line5 (2 Replies)
Discussion started by: kaibiganmi
2 Replies

10. Shell Programming and Scripting

Grabing Date from filename and adding to the end of each line in the file.

Hi, I have 24 .dat files something like below. The file name starts with “abc” followed by two digit month and two digit year. Is there a way to grab the month and year from each filename and append it to the end of each line. Once this is done I want to combine all the files into file... (1 Reply)
Discussion started by: rkumar28
1 Replies
Login or Register to Ask a Question