cut a string in a textfile line per line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting cut a string in a textfile line per line
# 1  
Old 05-05-2009
cut a string in a textfile line per line

i need to cut the string in a textfile but each line has a specific way of cutting it (different lengths)

i have a for loop that gets the string line per line, then each line has to be compared:

Code:
for x in `cat tmp2.txt`; do
       if[ "BAC" == _____  ]; then
               echo 'BAC'
       elif[ "sdf_ada" == _____]
         etcetc
       fi
done

sample tmp2.txt file that needs to be trim to specific length :
BAC0904102604 -> BAC
AFR0904102604 -> AFR
HHF0904102604 -> HHF
UI100904102907 ->UI10
sdf_ada_9541-9543.1 -> sdf_ada
_fsf13145533 ->fsf

note: the file is in random order.

Last edited by bakunin; 05-05-2009 at 07:07 AM.. Reason: added code-tags. Please do that yourself from now on. Thanks
# 2  
Old 05-05-2009
if there are 1 million lines in tht file how you will do the comparison ?...

from where u will get the length of the number of characters to cut ?...
# 3  
Old 05-05-2009
Quote:
Originally Posted by panyam
if there are 1 million lines in tht file how you will do the comparison ?...

from where u will get the length of the number of characters to cut ?...
basically there are 18 patterns all in all. (18 specific length)
So what i want to do is to 'for loop' the file to get each lines, and there will be 18 if,elif,else statements.
But i dont know how to exactly cut a string that will be used as condition to the if else statements.

here is my pseudocode:

Code:
for x in `cat tmp2.txt`; do     #get line per line
       if[ "BAC" == _____  ]; then        
               echo 'BAC'
       elif[ "sdf_ada" == _____]
         etcetc
       elif[ ... ] 
       fi (18x)
done

example of else-if statement (obviously not working, invented)
Code:
if[ "BAC" == substr($1,1,3)  ]; then        
               echo 'BAC'
elif[ "sdf_ada" ==  substr($1,1,7) ]
               echo 'sdf_ada'
elif[ "UI10" == substr($1,1,4) ]
..... (18x)

So far this is the solution that I have in mind. Although it is very inefficient.

p.s i assure you the files will not reach 1 million lines, its only about 20 lines (maximum).
# 4  
Old 05-05-2009
simply,
i need to match the lines into its corresp0nding prefix (18 prefix), so that i can count how many time each prefix appeared in the textfile.
# 5  
Old 05-05-2009
given your sample tmp2.txt:

Code:
BAC0904102604 -> BAC
AFR0904102604 -> AFR
HHF0904102604 -> HHF
UI100904102907 ->UI10
sdf_ada_9541-9543.1 -> sdf_ada
_fsf13145533 -> fsf

and myTextFile:
Code:
BAC0904102604foo
AFR0904102604bar fred
sdf_ada_9541-9543.1foo
HHF0904102604function
UI100904102907brick
sdf_ada_9541-9543.1red
_fsf13145533black
BAC0904102604yellow
sdf_ada_9541-9543.1fred

nawk -f izuma.awk tmp2.txt myTextFile
Code:
# reading the FIRST file specified on the command line.
# create an array 'arr' indexed by the FIRST field ($1) of the file with the value from the THIRD
# field ($3) of the file. Jump to the 'next' record without executing the rest of the code.
FNR==NR { arr[$1]=$3; next }

# reading the SECOND file specified on the command line.
{

  # iterate through all the entries in array 'arr' - 'i' contains the INDEX to array 'arr'
  # substitute a value of 'i' from the begging of the line (^) with the value of 'arr[i]'.
  # 'gsub' returns the NUMBER of the performed substitutions
  # add the number of the substitutions to an array 'sum' indexed by 'arr[i]'
  for( i in  arr)
     sum[arr[i]]+=gsub("^" i, arr[i])
}
END {

  # iterate through the array 'sum' outputting the total number of the performed substitutions
  # per pattern/prefix
  for (i in sum)
    printf("[%s] -> [%d]\n", i, sum[i])
}


Last edited by vgersh99; 05-06-2009 at 08:16 AM.. Reason: comments
# 6  
Old 05-05-2009
Quote:
Originally Posted by vgersh99
given your sample tmp2.txt:

Code:
BAC0904102604 -> BAC
AFR0904102604 -> AFR
HHF0904102604 -> HHF
UI100904102907 ->UI10
sdf_ada_9541-9543.1 -> sdf_ada
_fsf13145533 -> fsf

and myTextFile:
Code:
BAC0904102604foo
AFR0904102604bar fred
sdf_ada_9541-9543.1foo
HHF0904102604function
UI100904102907brick
sdf_ada_9541-9543.1red
_fsf13145533black
BAC0904102604yellow
sdf_ada_9541-9543.1fred

nawk -f izuma.awk tmp2.txt myTextFile
Code:
FNR==NR { arr[$1]=$3; next }
{
  for( i in  arr)
     sum[arr[i]]+=gsub("^" i, arr[i])
}
END {
  for (i in sum)
    printf("[%s] -> [%d]\n", i, sum[i])
}

could you provide an explanation? :P

i was considering using regex for this but dont know how
# 7  
Old 05-06-2009
added comments to the original post.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk print in one line after reading textfile with paragraphs

Hello everybody I have a text file which has the following format: nmm "text20140601.033954text" "text" "text" "text" , ... , "text" "text" , ... , Lat 36.3247 Lon 16.0588 Depth 8 "text", ... , "text" "text", ..., CovXX 1.65 CovYY 2.32 CovZZ 1.2 "text" , ..., "text nmm ... (6 Replies)
Discussion started by: phaethon
6 Replies

2. Shell Programming and Scripting

Write $line number into textfile and read from line number

Hello everyone, I don't really know anything about scripting, but I have to manage to make this script, out of necessity. #!/bin/bash while read -r line; do #I'm reading from a big wordlist instructions using $line done Is there a way to automatically write the $line number the script... (4 Replies)
Discussion started by: bobylapointe
4 Replies

3. Shell Programming and Scripting

Cut line up to a string

hi all In my bash script I want to cut a line up to a specific string and keep the rest of it but only up to a ".How can I do that?I imagine something with sed.. Let's say my line is: Jennifer Jones (student) "id:376765748587/7465674775" NewYork and i only want to keep: ... (9 Replies)
Discussion started by: vlm
9 Replies

4. Solaris

Line too long error Replace string with new line line character

I get a file which has all its content in a single row. The file contains xml data containing 3000 records, but all in a single row, making it difficult for Unix to Process the file. I decided to insert a new line character at all occurrences of a particular string in this file (say replacing... (4 Replies)
Discussion started by: ducati
4 Replies

5. Shell Programming and Scripting

How to extract more than 1 line in a textfile ?

Hi, I'm starting a little project with a shell script but I'm don't know how to do it. Maybe someone can help me. I have un text file like this : I'd like to do a script who will extract from my file from @ADDLINE1@ to @ADDLINE4@ only and I have no idea how to do this. Any idea ? ... (7 Replies)
Discussion started by: Poulki
7 Replies

6. Shell Programming and Scripting

Get line of textfile and store it in variable

Hi! I need to do the following: (1) I wan't to extract a line of a textfile (defined by a numer) and store it into a variable... (2) ...but I want to cut out a part of the line which is between two tokens and store just this to the variable Example: BlaBlaBla Bla2Bla2Bla2 *pPointerOne;... (4 Replies)
Discussion started by: Michi21609
4 Replies

7. Shell Programming and Scripting

Cut string from a line into a variable

Hi, I am working on a ksh script and I´m stuck on the following: I have to get the pthread_id from a procstack file for a particular tid#. ---------- tid# 1274057 (pthread ID: 1800) ---------- ---------- tid# 1736913 (pthread ID: 4019) ---------- ---------- tid# 1478705 (pthread ID: ... (7 Replies)
Discussion started by: tmf33uk
7 Replies

8. Shell Programming and Scripting

4 GB delimited-textfile on ONE LINE

I have delimited-text files ( > 4GB ) and is just one line. OS: HP-UX 11.23 Awk / cut / sed all have line_max limitations. & unable to read one line in (Buffered-mode). Sample file: xxxx|adsfadf|Afdsa|adsf|afds|Asdfas|ads|Afds|Asdf| .....till forever, I want to put a carriage... (5 Replies)
Discussion started by: magedfawzy
5 Replies

9. Shell Programming and Scripting

To cut end string from line

HI, I want to cut end string from line. e.g. i have following input line /users/home/test.txt I want to get end string 'test.txt' from above line and length of that end string will change and it always start after '/'. Thanks, Visu (7 Replies)
Discussion started by: visu
7 Replies
Login or Register to Ask a Question