Linux novice - Search and delete


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Linux novice - Search and delete
# 1  
Old 01-19-2011
Linux novice - Search and delete

Hi unix masters,

Im needing some guidance or a small code to enlight my problem.

Problem Example:
I have 3 different text ascii files. At each file, inside the text
have repeater marks.

--text 1 start--
123 -> mark
anytextanytext
anythinganything

123 ->mark
blahblah
blah

...
123->mark
...

--text 1 end--

Each file is different BUT some blocks of text, after the marker, are repeated
in the other file.

What im planning to do is merge all 3 files in 1. Find the repeated blocks
between the marks and delete the repeated parts leaving only 1 of
them. And dont delete the others non repeated blocks. I dont need sort the
blocks. Save the result in a new file.

I couldnt find a sourcecode (any language) or an program to do
what i need or to guide me. In the example is with only 3 files but
i need run the rotine at 350 files with thousand of blocks inside each file *dies*.

Should i use bash, pearl, python, emacs, some text processor? Some one
know a done code close to what i need to download? My skills are enough
to modify a file but i unable for now to code something from zero.

Thanks in advance guys!
# 2  
Old 01-19-2011
Give more clue : please provide a representative sample of input and expected output
# 3  
Old 01-19-2011
Ok!

--File 1

123
asdf
qwert
zxcv

123
yuio
hjkl
bnm

--File 2
123
rtyu
fghj
vbnm

123
asdf
qwert
zxcv

--Result
123
yuio
hjkl
bnm

123
rtyu
fghj
vbnm

123
asdf
qwert
zxcv

*2 unique blocks + 1 block of repeated
# 4  
Old 01-19-2011
Do your blocks have a fix length (always 4 lines + 1 empty line) ?
Or does it happend that some block can have more or less than 4 lines ?
This User Gave Thanks to ctsgnb For This Post:
# 5  
Old 01-19-2011
Code:
awk 'BEGIN{RS="";FS="\n"}
! a[$0]++ {b[$1]=b[$1]?b[$1] FS FS $0:$0} 
END {for ( i in b) print b[i] FS}' file1 file2

This User Gave Thanks to rdcwayx For This Post:
# 6  
Old 01-20-2011
Hi Ctsgnb,
The blocks dont have a fixed size. The range is 3~15 lines. And some blocks repeat 2 or more times in same file and in other file too. I have many files to process. So i guess i need join all in one and then process the repeated blocks. The pattern to define the start of block is 123 and a empty line at end of block.

Rdcwayx, i will test the command submited. But i have thousand of files to process. So execute by pairs isnt fast enough.

I was doing some more search at this forum and started to think about use awk to solve the problem. I never used this tool. Looks very versatile.

Thanks guys for the patience and help for a novice.

---------- Post updated at 08:31 AM ---------- Previous update was at 05:45 AM ----------

Rdcwayx!

Sorry, im slowmind as hell. I tested the command line. Worked.
I will check the consistency of output file but looking quickly worked.

Next will run in large scale, all files, and report the result.

---------- Post updated at 08:59 AM ---------- Previous update was at 08:31 AM ----------

God. Worked Perfectly!
Thank you very much. I downloaded and will start to read the awk manual right now.
Is a nice tool. Looks light and clean.

Thanks again guys!

Last edited by atnoz; 01-20-2011 at 06:48 AM.. Reason: Add info
# 7  
Old 01-21-2011
If you have more files, just add the file names one by one.

Code:
awk 'BEGIN{RS="";FS="\n"}
! a[$0]++ {b[$1]=b[$1]?b[$1] FS FS $0:$0} 
END {for ( i in b) print b[i] FS}' file1 file2 file3 file4 ...

or the efficient way

Code:
awk 'BEGIN{RS="";FS="\n"}
! a[$0]++ {b[$1]=b[$1]?b[$1] FS FS $0:$0} 
END {for ( i in b) print b[i] FS}' file*

 
Login or Register to Ask a Question

Previous Thread | Next Thread

3 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search and delete

A file(example viv.txt) contains a n number of rows like 1,2,3,4,5,6,7,8 11,22,33,44,55,66,77,88 1,3,9,7,8,5,2,6,4 Requirement: If number "5" exist in the sixth position then it should delete the whole line.like this it should check for all rows in a file(viv.txt) . please help on this. (5 Replies)
Discussion started by: katakamvivek
5 Replies

2. Shell Programming and Scripting

Search and delete

Gurus I have a CSV containing 60K records.Each row has 8 columns. On some rows ,for the 7th column ,i find word 'UnknownState(898914497)' repeated many times. e.g <N_HOST> <tcp> <*> <*> <*> <*> ... (1 Reply)
Discussion started by: ak835
1 Replies

3. Shell Programming and Scripting

Search text and delete

After searching for a specified string, I would like to delete couple of rows from the file and continue searching. Basically, I would like to search through a text file that holds logs with date-time stamp in them and then clean up the file if the log entry is more than 2 days old. Example log... (0 Replies)
Discussion started by: new2shell
0 Replies
Login or Register to Ask a Question