Text file parsing and comparison


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Text file parsing and comparison
# 1  
Old 09-12-2018
Text file parsing and comparison

I have two files (first.txt and second.txt):

Code:
more first.txt 

        cat mammal

        lizard reptile

        Elephant mammal

        ant Insecta


Code:
more second.txt 

        ant     termite

        ant     army_ant

        human   man

        human   woman


I want to make a third file that takes the relevant entry in the second column of the first file and paste it as a third column in the second file when the first column in first and second file matches.



Code:
more third.txt 
        ant     termite insecta
        ant     army_ant        insecta
        human   man     mammal
        human   woman   mammal

Note: The third file that I include here to illuminate my point shouldn't have human man mammal or human woman mammal . I was wrong. Check the entire thread for explanation.

Last edited by cs_novice; 09-17-2018 at 01:44 PM..
# 2  
Old 09-12-2018
What have you tried so far and where exactly are you stuck?
And how did you arrive at human man mammal in your desired output from the sample inputs you gave?
This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 09-12-2018
To clarify, the third.txt file is what I expect as output and I just put that out for illustration purposes. Reg my attempts: I tried capturing the the data in the first file data in an awk associative array, but then I am unable to think of the right conditional statement to print the corresponding entries from the first file as the third column in the second file.
# 4  
Old 09-12-2018
Quote:
Originally Posted by cs_novice
To clarify, the third.txt file is what I expect as output and I just put that out for illustration purposes. Reg my attempts: I tried capturing the the data in the first file data in an awk associative array, but then I am unable to think of the right conditional statement to print the corresponding entries from the first file as the third column in the second file.
it might help to see the attempt so we could help to straighten it out.
It's a very doable covered many time in these fora req.

Last edited by Scott; 09-12-2018 at 05:44 PM.. Reason: Edited
# 5  
Old 09-12-2018
First, my suggestion is to re-read the forum rules: Homework/coursework questions are to be posted in a special section of the forum where special rules apply.

Second, i suggest to type

Code:
man join

into the next available terminal and read what stands there. This thread is

Moderator's Comments:
Mod Comment -closed-


bakunin
# 6  
Old 09-13-2018
Quote:
Originally Posted by bakunin
First, my suggestion is to re-read the forum rules: Homework/coursework questions are to be posted in a special section of the forum where special rules apply.
After checking back with the thread owner i learned that i was wrong and this is not homework. My apologies. This thread is

Moderator's Comments:
Mod Comment - reopened -


bakunin
# 7  
Old 09-13-2018
As i suggested join i will explain how to use it. Some work will be left over for the reader and effort on thread-owners part to solve the problem will be appreciated. The following is a loose translation from the german Wikipedia article which i also wrote:

join is used to annex information of several (usually two) input data streams (files or pipelines) and output the result. The input should be in some sort of record-format: a table-like structure in which records - separated by newline characters - exist which themselves consist of fields separated by field separators.

Example:
Code:
      field separators (here tabs)
            |
    fields  +---------+------+
      |     |         |      |
      |     |         |      |
      |-----|-----+---|-+----|----+
      |     |     |   | |    |    |
      V     V     V   V V    V    V
     Peter      Smith   38      50.000      <--- record
     Paul       Miller  40      55.000      <--- record
     Mary       Myers   32      60.000      <--- record

We see a table of persons with some characteristics: surname, family name, age, income. Each person is described in its record and each record consists of several fields, each denoting one such characteristic. Note that we could have put captions as table headers but these would NOT be part of the table.

join now creates an relation between two (or more) such tables. If a record in one table relates to several records in the other table it will be copied as often as necessary. Here is an example:

Code:
A:               B:            result:
     f1 a           f1 X               f1 a X
     f1 b                              f1 b X
     f1 c                              f1 c X

Let us put all together: Suppose we have a file ("tel") with people and their telephone numbers:

Code:
>Name	Telephone
Anna	123456-123
Karl	123456-456
Sandra	123457-789

And we have another file ("fax") with people and their Fax-number:

Code:
>Name	Fax
Anna	345678-997
Leo	345679-998
Sandra	345678-999

Notice that both files are tab-separated again, so that between fields there is always a single tabulator-character. The first try

Code:
$ join tel fax

>Name Telephone Fax
Anna 123456-123 345678-997
Sandra 123457-789 345678-999

would per default join over the first fields (the names) and only output the values available in both files. Database people would call that an inner join. Notice also that we have entered captions as a pseudo-record.

But this messy output is perhaps not what we want. join per default uses any whitespace as field separator but it can specifically be told (the -t to use a certain character. This character will in turn also be used in the output.

In addition we can specify a certain order of output fields (-o) if we don't want all of them to appear. The resulting output looks a lot better now:

Code:
$ join -t'<tab>' -o 0,1.2,2.2 tel fax

>Name	Telephone	Fax
Anna	123456-123	345678-997
Sandra	123457-789	345678-999

Furthermore we can change the default inner join to an outer join (include records not available in both files, -a) and we can assign a standard filler text for the missing information (-e):

Code:
$ join -t'<tab>' -a 1 -a 2 -e '(none)' -o 0,1.2,2.2 tel fax

>Name	Telephone	Fax
Anna	123456-123	345678-997
Karl	123456-456	(none)
Leo	(none)		345679-998
Sandra	123457-789	345678-999

At last we can also invert the joining so that only records appear in the output which are NOT present in every file - a list of people having either no phone or no Fax:

Code:
$ join -t'<tab>' -v 1 -v 2 -o 0 tel fax

Karl
Leo

A tip at last: all the input files to join have to be sorted already. In this case "sorted" means: sorted for the fields which will used to join the information. Otherwise some or maybe all records will be mysteriously missing from the output. In my example this was silently done before (this is the reason why i used ">" to mark the captions - it sorts before any character so that the header comes out on top).

The implementation of this is now left to the interested reader who is, by now, surely eager to try his newfound powers on his data. Be sure to post your results so that others can learn from your achievements as well as your mistakes.

I hope this helps.

bakunin
These 2 Users Gave Thanks to bakunin For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing text file

Hi Friends, I am back for the second round today - :D My input text file is this way Home friends friendship meter Tools Mirrors Downloads My Data About Us Help My own results BLAT Search Results ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND ... (7 Replies)
Discussion started by: jacobs.smith
7 Replies

2. Shell Programming and Scripting

Parsing text file

I'm totally stumped with how to handle this huge text file I'm trying to deal with. I really need some help! Here is what is looks like: ab1ba67c331a3d731396322fad8dd71a3b627f89359827697645c806091c40b9 0.2 812a3c3684310045f1cb3157bf5eebc4379804e98c82b56f3944564e7bf5dab5 0.6 0.6... (3 Replies)
Discussion started by: comp8765
3 Replies

3. Shell Programming and Scripting

text file comparison

Hi All, I would like to write script which will compare two text files, however the order of the content in the files might change, for eg, File 1 File 2 ------- -------- ABC ABC DEF GHI GHI ... (3 Replies)
Discussion started by: Sub.kalps
3 Replies

4. Programming

Parsing a Text file using C++

I was trying to parse the text file, which will looks like this ###XYZABC#### ############ int = 4 char = 1 float = 1 . . ############ like this my text file will contains lots of entries and I need to store these entries in the map eg. map.first = int and map.second = 4 same way I... (5 Replies)
Discussion started by: agupta2
5 Replies

5. Shell Programming and Scripting

Need help parsing a text file

I have a text file: router1#sh ip blah blah | incl --- Gi2/8 10.60.4.181 --- 10.60.123.175 11 0000 0000 355K Gi2/8 10.60.83.28 --- 224.10.10.26 11 F9FF 3840 154K Gi2/8 10.60.83.198 --- ... (1 Reply)
Discussion started by: streetfighter2
1 Replies

6. Shell Programming and Scripting

Log file text parsing

I'm new to scripting and was wondering if there was a way to accomplish what I want below using shell script(s). If there is a log file as follows, where the id is the unique id of a process, with the timestamp of when the process began and completed displayed, would it be possible to find the... (3 Replies)
Discussion started by: dizydolly
3 Replies

7. Shell Programming and Scripting

Complicated(?) text file comparison

I've got two files, both plain text. Each file is a datafeed of products, pipe delimited. The current file is in directory 1 and yesterday's file is in directory 2 (literally, those are the directory names). What I'm trying to do is compare the files and pull out products whose price has changed... (3 Replies)
Discussion started by: Daniel M. Clark
3 Replies

8. Shell Programming and Scripting

Parsing text from file

Any ideas? 1)loop through text file 2)extract everything between SOL and EOL 3)output files, for example: 123.txt and 124.txt for the file below So far I have: sed -n "/SOL/,/EOL/{p;/EOL/q;}" file Here is an example of my text file. SOL-123.go something goes here something goes... (0 Replies)
Discussion started by: ndnkyd
0 Replies

9. Shell Programming and Scripting

Text File Parsing

Hey Guys.I am a newbie on Bash Shell Scripting and Perl.And I have a question about file parsing. I have a log file which contains reports about a communication device.I need to take some of the reports from the log file.Its hard to explain the issue.but shortly I can say that, the reports has a... (2 Replies)
Discussion started by: Djlethal
2 Replies

10. HP-UX

XML parsing performace comparison with windows using sax

sorry wrong forum..i dont know how to delete this or how to move it to HP UX section... I tested SAX XML parsing using xerces(http://xerces.apache.org/xerces-j/). I tested on Windows XP and HP-UX . I found that parsing time on HP is 5 times that on Windows. My server startup reads a lot of XML... (1 Reply)
Discussion started by: saurabh.sid
1 Replies
Login or Register to Ask a Question