Match text to lines in a file, iterate backwards until text or text substring matches, print to file


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
# 1  
Match text to lines in a file, iterate backwards until text or text substring matches, print to file

hi all,

trying this using shell/bash with sed/awk/grep

I have two files, one containing one column, the other containing multiple columns (comma delimited).

Code:
file1.txt
abc12345
def12345
ghi54321
...

Code:
file2.txt
abc1,text1,texta
abc,text2,textb
def123,text3,textc
gh,text4,textd
...

i'm trying to take each line in file1 and using the original text to match, if no match, iterate backwards one character at time, until it matches first column in file2, loop through all of file2 and print all matching lines where text and any substring matches the first column of file2 to another file. output file3 essentially will have concatenated output of original text from file1 and matching lines from file2

output example:

Code:
file3.txt
abc12345,abc1,text1,texta 
abc12345,abc,text2,textb
def12345,def123,text3,textc
ghi154321,gh,text4,textd
...

any help would be much appreciated.
# 2  
Our general policy is for everyone here to "try and write their own script first" and then post what you tried.

Sometimes, our members forget and respond like a "script writing service"; but that is not our policy.

So please post the code you tried to write "on your own" and any error messages you got.

Thanks!
# 3  
It is better to cycle through the 1st column in file2 and find a value* match in file1.
In shell it can be a done with a case statement.
# 4  
shogun1970

Apart from showing us your attempts at this problem, could you also indicate if the order of records in the output file is important?

The problem description seems to indicate that a direct match record, when available, should be listed first and then other matches should be displayed in the same order as they appear in file1.txt.

However if the order of records in the output file is unimportant, the solution can be simplified a fair bit.
# 5  
A shell script that works as I described:
Code:
#!/bin/sh
# read from f1, print in this order
while read f1line
do
  # read from f2, find matches => print
  while IFS="," read f2col1 f2othercols
  do
    case $f1line in
    ("$f2col1"*)
      echo "$f1line,$f2col1,$f2othercols"
    esac
  done < file2.txt
done < file1.txt

The same idea in awk (file2 is read into an array variable first):
Code:
#!/bin/sh
awk -F"," '
{
  if (NR==FNR) {
  # read from f2 into associative array col1[]
    col1[$1]=($2 FS $3)
  } else {
  # read from f1, find matches => print
    for (c in col1)
      if (c == substr($0,1,length(c)))
        print $0 FS c FS col1[c]
  }  
}
' file2.txt file1.txt

In awk your propsed way can be implemented with no big overhead:
Code:
#!/bin/sh
awk -F"," '
{
  if (NR==FNR) {
  # read from f2 into associative array col1[]
    col1[$1]=($2 FS $3)
  } else {
  # read from f1, find matches => print
  for (i=length; i>=1; i--)
    if ((c=substr($0,1,i)) in col1)
      print $0 FS c FS col1[c]
  }  
}
' file2.txt file1.txt

# 6  
Try also
Code:
awk -F, 'FNR==NR {T[$0]; next} {for (t in T) if (t ~ $1) print t, $0}' OFS=, file[12]
abc12345,abc1,text1,texta
abc12345,abc,text2,textb
def12345,def123,text3,textc
ghi54321,gh,text4,textd

# 7  
The if (t ~ $1) is a RE match that is "fuzzy" unless it is anchored.
Should be if (t ~ ("^" $1)); the ^ anchor means the string $1 must occur at the beginning of string t.
This User Gave Thanks to MadeInGermany For This Post:
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Science: Computers
Difficulty: Medium
Linus Sebastian is the creator of the Linux kernel, which went on to be used in Linux, Android, and Chrome OS.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match all lines in file where specific text pattern is less than

In the below file I am trying to grep or similar, all lines where only AF= is less than 0.4.. Thank you :). grep grep "AF=" ,+ .4 file file 12 112036782 . T C 34.0248 PASS ... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

Using awk to remove lines from file that match text

I am trying to remove each line in which $2 is FP or RFP. I believe the below will remove one instance but not both. Thank you :). file 12 123 FP 11 10 RFP awk awk -F'\t' ' $2 != "FP"' file desired output 12 11 (6 Replies)
Discussion started by: cmccabe
6 Replies

3. Shell Programming and Scripting

Match text from file 1 to file 2 and return specific text

I hope this makes sense and is possible. I am trying to match $1 of panel_genes.txt with $3 of RefSeqGene.txt and when a match is found the value in $6 of RefSeqGene.txt Example: ACTA2 is $1 of panel_genes.txt ACTA2 NM_001613.2 ACTA2 NM_001141945.1 awk 'FNR==NR {... (4 Replies)
Discussion started by: cmccabe
4 Replies

4. Shell Programming and Scripting

Match text and print/pipe only that text

I'm trying to pull an image source url from a html source file. I'm new with regex. I'm in BaSH. I've tried grep -E 'http.*jpg' file which highlights the text, but gives me 2 problems: 1) Results aren't stand alone and can't be piped to another command. (I believe it includes everything in... (5 Replies)
Discussion started by: amx401
5 Replies

5. Shell Programming and Scripting

Read n lines from a text files getting n from within the text file

I dont even have a sample script cause I dont know where to start from. My data lookes like this > sat#16 #data: 15 site:UNZA baseline: 205.9151 0.008 -165.2465 35.8109 40.6685 21.9148 121.1446 26.4629 -18.4976 33.8722 0.017 -165.2243 48.2201 40.6908 ... (8 Replies)
Discussion started by: malandisa
8 Replies

6. Shell Programming and Scripting

How to delete lines of a text file based on another text file?

I have 2 TXT files with with 8 columns in them(tab separated). First file has 2000 entries whereas 2nd file has 300 entries. The first file has ALL the lines of second file. Now I need to remove those 300 lines (which are in both files) from first file so that first file's line count become... (2 Replies)
Discussion started by: prvnrk
2 Replies

7. UNIX for Dummies Questions & Answers

Extracting lines from a text file based on another text file with line numbers

Hi, I am trying to extract lines from a text file given a text file containing line numbers to be extracted from the first file. How do I go about doing this? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

8. Shell Programming and Scripting

[bash help]Adding multiple lines of text into a specific spot into a text file

I am attempting to insert multiple lines of text into a specific place in a text file based on the lines above or below it. For example, Here is a portion of a zone file. IN NS ns1.domain.tld. IN NS ns2.domain.tld. IN ... (2 Replies)
Discussion started by: cdn_humbucker
2 Replies

9. Shell Programming and Scripting

Search text from a file and print text and one previous line too

Hi, Please let me know how to find text and print text and its previous line. Please don't get irritated few days back I asked text and next line. I am using HP-UX 11.11 Thanks for your help. (6 Replies)
Discussion started by: kamranjalal
6 Replies

10. Shell Programming and Scripting

Print only certain lines from a text file

Hi all, I have a text file and I want to clean up the file by only print those lines start with the date. Is there anyway I can do that?  Thanks CT (1 Reply)
Discussion started by: CamTu
1 Replies

Featured Tech Videos