Parsing a file based on positional constraints


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parsing a file based on positional constraints
# 1  
Old 12-10-2014
Parsing a file based on positional constraints

I have a list file1 like
Code:
dog
cow
fox
cat
fish
duck
crow

I want to classify the elements of file1 based on constrains applied on file2. Additionally the number of elements (words) in the each line of file2 is not fixed. This is my file2
Code:
cow cat fox dog
cow fox dog
fish crow fox dog cat 
cat dog duck
cow

explanation of position constrains:
- For searching first line of file1 "dog" in file 2, first its is needed to check if is first or last element of the line1 in file2 then again if is first or last element of the line2 in file2 and so on till the end of file. If the position of "dog" occurs every time (whenever it occurs) as a first or last position then it will go to a new file called file_external. If not then it will go to new file file_internal. In this case dog also comes at line 3 position 4 so it will move to file_internal.
- The second element in file1 is cow. In the file2 cow is every-time at 1st or last position (whenever occurs), so cow will append to file_external.
- fox comes every time in middle positions so it will be in file_internal.
- cat comes in middle, end and first positions, so will append to file_internal.
- fish and duck will move to file_external as these are strictly as first or last element.
- crow will fly into file_internal.
I am looking for help to do this with awk/perl.
# 2  
Old 12-11-2014
Forget my first question; while rereading the problem I see that that case is covered in the description....

So, that just leaves:
  1. What OS are you using?
  2. What have you tried?
  3. And, is this a homework assignment?
# 3  
Old 12-11-2014
I am using ubuntu and windows. I am a biologist and do experiments most of the time but sometimes I write programs to analyze my data. But not efficient in programming. This is difficult for me to write. I tried but could not produce correct outputs. This is a sample file I prepared to test. I need to run this on a very big dataset.
# 4  
Old 12-11-2014
Try
Code:
awk     'NR==FNR        {T[$1]; next}
                        {for (i in T)
                                {if (T[i]!="Int")
                                   {if ($1==i || $NF==i) T[i]="Ext"
                                    for (j=2; j<NF; j++) if ($j==i) T[i]="Int"
                                   }
                                }
                        }
         END            {for (i in T) print i > T[i]}  
        ' file1 file2
  Int :
dog
crow
cat
fox
  Ext :
fish
duck
cow

# 5  
Old 12-11-2014
Quote:
If the position of "dog" occurs every time (whenever it occurs) as a first or last position then it will go to a new file called file_external. If not then it will go to new file file_internal.
Quote:
cat comes in middle, end and first positions, so will append to file_internal.
To me, this is not clear... dog occurs as a first and/or last position and goes to file_ext, if not, he goes to file_int. Question: Why does the cat go to file_int no matter of the position?

Anyhow, you might want to try this code:
Code:
awk 'NR==FNR{ A[$0]++; next }
    { for (i=1;i<=NF;i++)
        {
            if ($i in A) if (i == 1 || i == NF) { E[$i]++ } else { I[$i]++ }
        }
}
END { for (e in E) print e; print "----------------"; for (i in I) print i }
' file1 file2

Demo:
Code:
$ awk 'NR==FNR{ A[$0]++; next }
>     { for (i=1;i<=NF;i++)
>         {
>             if ($i in A) if (i == 1 || i == NF) { E[$i]++ } else { I[$i]++ }
>         }
> }
> END { for (e in E) print e; print "----------------"; for (i in I) print i }
> ' file1 file2
fish
dog
duck
cow
cat
----------------
dog
crow
cat
fox
$

If the output is correct, in the awk END section you can simply change print e to print e >> "file.ext" and print i to print i >> "file.int" to redirect the output to appropriate files.

Hope I didn't miss the point Smilie
# 6  
Old 12-11-2014
The logic I used was to test for target (file1) in each line (file2); if target
exists and is not in first or last position, then it has to be internal. Print and get next
target.
Code:
#!/bin/bash

while read X
do
    outfil=external.txt
    while read line
    do
        set $line
        if [[ $line =~ $X && ! $X =~ $1 && ! $X =~ ${@: -1} ]]; then
            outfil=internal.txt
            break
        fi
    done  < anim2
# remove quotes to export to files
    echo "$X >> $outfil"
done < anim1

#eof

# --- output ---
# dog >> internal.txt
# cow >> external.txt
# fox >> internal.txt
# cat >> internal.txt
# fish >> external.txt
# duck >> external.txt
# crow >> internal.txt

This assumes that all items in file1 do always exist in file2.

Last edited by ongoto; 12-11-2014 at 05:20 PM.. Reason: opted for 'while read X' over 'for X'
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

File Parsing based on a character in a specific field

Hi All, I'm having a hard time finding a starting point for my issue. I have a 30k line file (fspsec.txt) that I would like to parse into smaller files based on any character existing in field 1. ACCOUNTANT LEVEL 1 (ACCT.ACCOUNTANT) OPERATORS: DOEJO (418) TOOLS: Branch Maintenance ... (2 Replies)
Discussion started by: aahlrich
2 Replies

2. Shell Programming and Scripting

Parsing a file based on next line

I have a file1 like ID E2AK1_HUMAN Reviewed; 630 AA. CC -!- SUBCELLULAR LOCATION: Host nucleus {ECO:0000305}. ID E1A_ADEM1 Reviewed; 200 AA. ID E1A_ADES7 Reviewed; 266 AA. CC -!- SUBCELLULAR LOCATION: Host nucleus... (8 Replies)
Discussion started by: sammy777
8 Replies

3. Shell Programming and Scripting

Positional Update of XML File

Hello, I have a XML file and need to update the data for a specific XML Attribute in the file. I need a Perl or Awk command to look for <INTERCHANGE_CONTROL_NO>000000601</INTERCHANGE_CONTROL_NO> in the XML file and change the first two 0 of the value to 9. For instance ... (4 Replies)
Discussion started by: Praveenkulkarni
4 Replies

4. UNIX for Dummies Questions & Answers

Remove lines in a positional file based on string value

Gurus, I am relatively new to Unix scripting and am struck with a problem in my script. I have positional input file which has a FLAG indicator in at position 11 in every record of the file. If the Flag has value =Y, then the record from the input needs to be written to a new file.However if... (3 Replies)
Discussion started by: gsam
3 Replies

5. Shell Programming and Scripting

parsing file based on characters/bytes

I have a datafile that is formatted as fixed. I know that each line should contain 880 characters. I want to separate the file into 2 files, one that has lines with 880 characters and the other file with everything else. Is this possible ? (9 Replies)
Discussion started by: cheeko111
9 Replies

6. Shell Programming and Scripting

Parsing Log File Based on Date & Error

I'm still up trying to figure this out and it is driving me nuts. I have a log file which has a basic format of this... 2010-10-10 22:25:42 Init block 'UA Deployment Date': Dynamic refresh of repository scope variables has failed. The ODBC function has returned an error. The database... (4 Replies)
Discussion started by: k1ko
4 Replies

7. Shell Programming and Scripting

Split positional flat file.

Hi, I need to split positional flat file, based on value at position 43-45.( in red "410") Example: 12345678907886421689 200920184820410200920020092002007 12345678907886421689 200920184820411200920020092002007 12345678907886421689 200920184820411200920020092002007... (6 Replies)
Discussion started by: meetmedude
6 Replies

8. Shell Programming and Scripting

Get the positional value from first line of the file

Hi, I have one flat file with delimited field as pipe(|) symbol. The file contains header,detail lines. Header is the first line in the file. I want to read the value for the position from 15 to 18 in first line of the file. Pls help me to get the value from position 15 to 18 in... (3 Replies)
Discussion started by: praka
3 Replies

9. Shell Programming and Scripting

Parsing of file for Report Generation (String parsing and splitting)

Hey guys, I have this file generated by me... i want to create some HTML output from it. The problem is that i am really confused about how do I go about reading the file. The file is in the following format: TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)
Discussion started by: umar.shaikh
8 Replies
Login or Register to Ask a Question