Extract lines from files

Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract lines from files
# 8  
Old 08-14-2009
Extract sentences from files


There is a slight change in the problem definition. The first file me.txt is in the format as follows

0.01474818	  0.16279070	M1	E1
0.01081743	  0.11940299	M2	E2
0.00765600	  0.08450704	M3	E3 
0.00441931	  0.04878049	M4	E5 
0.01904574	  0.21022727	M5	E10
0.00510400	  0.05633803	M6	E12
0.00905960	  0.10000000	M7	E16
0.00799376	  0.08823529	M8	E17
0.00424669	  0.04687500	M9	E18
0.01317759	  0.14545455	M12	E19
0.00403645	  0.04455446	M13	E20
0.01041333	  0.11494253	M16	E21
0.00683743	  0.07547170	M17	E22
0.00734562	0.08108108	M18	E23

I have attached sample input file ( E_Sentences_input.txt) and expected output file (E_Sentence_expected_out.txt). So looking at the 4th column of first file (me.txt), extract the sentences as given in the expected output format. So, I want to write the code in Perl. Thanks in advance.

# 9  
Old 08-17-2009

Except for some similar terms, this looks like a different problem.

You have 2 lists, essentially of tags, in increasing order, "E" followed by a decimal number, e.g. "E1", "E9" etc. The first list is a column in a file of other data, the second is a prefix to lines (some quite long).

You seem to want the tagged lines in the data file to be copied to STDOUT. Because both lists are in increasing order, this is basically a copy of tagged lines as selected by the first list, with the tag omitted, and with an empty line between them.

Does that describe the situation? ... cheers, drl
# 10  
Old 08-18-2009
Exactly, copy the selected tagged lines (i.e not all the lines in the tagged files) to STDOUT after omitting the tags by looking up the 4th column of the first list in sequential order. (E1, E2, E3, E5, E10 etc.)
# 11  
Old 08-18-2009

This shell script drives the perl script, and compares the generated output to your posted expected output:
#!/usr/bin/env bash

# @(#) s2	Demonstrate exercise of selector perl code.

set +o nounset
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) perl cmp sdiff diff
set -o nounset

echo " Lines in index file: $(wc -l <data1)"
echo " Lines in expected output: $(wc -l <expected-output.txt)"

echo " Results:"
./p2 > t1
echo " Lines in output file: $(wc -l <t1)"
if cmp t1 expected-output.txt
  echo " Files are the same."
  echo " Files differ."
  sdiff -w78 -s t1 expected-output.txt

exit 0

the perl script:

# @(#) p2	Demonstrate selection of tagged lines.

use warnings;
use strict;

my ($debug);
$debug = 1;
$debug = 0;

my ( $f1, $f2, $i, %selectors, $junk, $key, @parts, $t1 );

open( $f1, "<", "data1" ) || die(" Cannot open data1.\n");

# Get the tags into hash selectors.

while (<$f1>) {
  $t1 = (split)[3];
print " selectors is :%selectors:\n" if $debug;
close $f1;

open( $f2, "<", "data2" ) || die(" Cannot open data2.\n");

# Read data file of words, check for exisitence in selectors hash.

while (<$f2>) {
  @parts = split( / /, $_, 2 );
  print " Working on tag $parts[0]\n" if $debug;
  if ( not exists( $selectors{ $parts[0] } ) ) {
    print " Skipping tagged line $parts[0]\n" if $debug;
  print "$parts[1]\n\n";


% ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
perl 5.10.0
cmp (GNU diffutils) 2.8.1
sdiff (GNU diffutils) 2.8.1
diff (GNU diffutils) 2.8.1
 Lines in index file: 14
 Lines in expected output: 28

 Lines in output file: 28
t1 expected-output.txt differ: char 744, line 9
 Files differ.
This was disclosed by ATSUM informati |	This was disclosed by ATSUM informati
During the meeting today, the delegat |	During the meeting today, the delegat

The 2 lines which differ from the expected output do so because there are extra embedded spaces in those specific lines in the expected output file compared to the source file.

Best wishes ... cheers, drl
# 12  
Old 08-23-2009
Thanks a lot. It was excellent piece of work. It worked.

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Extract the same lines from the two files

I used to use this script to extract the same lines from two files: grep -f file1 file2 > outputfile now I have file1 AB029895 AF208401 AF309648 AF526378 AJ444445 AJ720950 AJ851546 AY568629 AY591907 AY994087 BU116401 BU116599 BU119689 BU121308 BU125622 BU231446 BU236750 BU237045 (4 Replies)
Discussion started by: yuejian
4 Replies

2. Shell Programming and Scripting

Extract lines that appear twice

I have a text file that looks like this : root/user/usr1/0001/abab1* root/user/usr1/0001/abab2* root/user/usr1/0002/acac1* root/user/usr1/0002/acac2* root/user/usr1/0003/adad1* root/user/usr1/0004/aeae1* root/user/usr1/0004/aeae2* How could I code this to extract just the subjects... (9 Replies)
Discussion started by: LeftoverStew
9 Replies

3. Shell Programming and Scripting

ksh sed - Extract specific lines with mulitple occurance of interesting lines

Data file example I look for primary and * to isolate the interesting slot number. slot=`sed '/^primary$/,/\*/!d' filename | tail -1 | sed s'/*//' | awk '{print $1" "$2}'` Now I want to get the Touch line for only the associate slot number, in this case, because the asterisk... (2 Replies)
Discussion started by: popeye
2 Replies

4. Shell Programming and Scripting

Extract lines from text files

I have some files containing the following data # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA 1 196 A M 0 0 230 0, 0.0 2,-0.2 0, 0.0 0, 0.0 0.000 360.0 360.0 360.0 76.4 21.7 -6.8 11.3 2 197 A D + 0 0 175 1,-0.1 2,-0.1 0, 0.0 0, 0.0... (10 Replies)
Discussion started by: edweena
10 Replies

5. Shell Programming and Scripting

Can you extract (remove) lines from log files?

I use "MineOS" (a linux distro with python scripts and web ui included for managing a Minecraft Server). The author of the scripts is currently having a problem with the Minecraft server log file being spammed with certain entries. He's working on clearing up the spam. But in the meantime, I'm... (8 Replies)
Discussion started by: nbsparks
8 Replies

6. Shell Programming and Scripting

Search for a pattern,extract value(s) from next line, extract lines having those extracted value(s)

I have hundreds of files to process. In each file I need to look for a pattern then extract value(s) from next line and then search for value(s) selected from point (2) in the same file at a specific position. HEADER ELECTRON TRANSPORT 18-MAR-98 1A7V TITLE CYTOCHROME... (7 Replies)
Discussion started by: AshwaniSharma09
7 Replies

7. UNIX for Dummies Questions & Answers

Extract lines with specific words with addition 2 lines before and after

Dear all, Greetings. I would like to ask for your help to extract lines with specific words in addition 2 lines before and after these lines by using awk or sed. For example, the input file is: 1 ak1 abc1.0 1 ak2 abc1.0 1 ak3 abc1.0 1 ak4 abc1.0 1 ak5 abc1.1 1 ak6 abc1.1 1 ak7... (7 Replies)
Discussion started by: Amanda Low
7 Replies

8. Shell Programming and Scripting

How to extract lines between tags into different files?

I have an xml file with the below data: unix>Cat address.xml <Address City=”Amsterdam” Street = “station straat” ZIPCODE="2516 CK " </Address> <Address City=”Amsterdam” Street = “Leeuwen straat” ZIPCODE="2517 AB " </Address> <Address City=”The Hauge” Street = “kirk straat” ... (1 Reply)
Discussion started by: LinuxLearner
1 Replies

9. Shell Programming and Scripting

extract nth line of all files and print in output file on separate lines.

Hello UNIX experts, I have 124 text files in a directory. I want to extract the 45678th line of all the files sequentialy by file names. The extracted lines should be printed in the output file on seperate lines. e.g. The input Files are one.txt, two.txt, three.txt, four.txt The cat of four... (1 Reply)
Discussion started by: yogeshkumkar
1 Replies

10. Shell Programming and Scripting

is it hard to extract particular lines & strings from the files??

Hi Experts, I have lots of big size files. Below is the snapshot of a file. From the files i want extract informmation like belows. What could be command or script for that? DELETE RESP:940120105 CREATE RESP:0 GET RESP:0 File contains like below- ... ... <log... (8 Replies)
Discussion started by: thepurple
8 Replies
Login or Register to Ask a Question