Remove lines if some data is the same: Centos5 / bash


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove lines if some data is the same: Centos5 / bash
# 1  
Old 07-24-2012
Remove lines if some data is the same: Centos5 / bash

Ok what i have is 8 separate files based on how many IP's are associated with the domain. I want to limit duplication of ip's within the individual files (ie. i don't care if the same ip is in 1 IP file and 4 IP file). Can someone help me with how to go through the files and remove lines that have have an IP already in a previous line (within that same file). In the 2 IP file the What and Where lines should be removed since each has an ip in the Who line. In the 4 IP file when the spock line is removed since it has an ip from Kirk that file is then ok since the McCoy line is then no longer duplicating 10.100.200.200. Hopefully this is understandable (it makes sense to me Smilie ). I have been able to get rid of domains that have all the same ip's associated by using sort -k? -k? -u to ignore the first field, but can't figure out how to do single ip's from a line and test against other lines Smilie.

I'm doing this on a CentOS 5.8 box in a bash script (whole lot more processing going on all around this portion).


EXAMPLES: Colon separated lines in each file
domain.com:ip:ip:

1 IP file

Any.com:192.168.10.100:
Where.edu:192.168.10.200:


2 IP file

Who.com: 192.168.10.300:192.168.10.200:
What.gov:10.0.0.150:192.168.10.300:
Where.biz:192.168.10.200:10.10.0.10:
When.tv:192.168.10.10:192.168.10.11:

4 IP file

Kirk.ufp:10.0.100.100:10.0.200.100:10.0.200.200:10.0.100.200:
Spock.vsa:10.100.100.100:10.100.100.200:10.100.200.200:10.0.100.100
Mccoy.ama:10.100.200.200:192.168.200.200:192.168.100.200:192.168.100.201
# 2  
Old 07-24-2012
What output do you want from this input?
# 3  
Old 07-24-2012
The files end up having only one line that contains any one ip. It sounds confusing to me an i'm the one asking for help. If the IP is associated with a domain already i don't want it listed again.

1 IP file

Any.com:192.168.10.100:
Where.edu:192.168.10.200:


2 IP file

Who.com: 192.168.10.300:192.168.10.200:
What.gov:10.0.0.150:192.168.10.300:
Where.biz:192.168.10.200:10.10.0.10:
When.tv:192.168.10.10:192.168.10.11:

4 IP file

Kirk.ufp:10.0.100.100:10.0.200.100:10.0.200.200:10.0.100.200:
Spock.vsa:10.100.100.100:10.100.100.200:10.100.200.200:10.0.100.100
Mccoy.ama:10.100.200.200:192.168.200.200:192.168.100.200:192.168.100.201
# 4  
Old 07-25-2012
Thanks anyway y'all. I decided to go another direction.

What i did was cat all the individual files together. Once with the complete lines included the other time just the ip's sorted and unique. Then i did a loop reading the ip checking it against my output file to make sure the ip hasn't already been logged. Then if it hasn't, grep the domain (first field) from the full listing file and tailed the last entry (lines with the most IP's associated to the domain). lather, rinse, repeat.

Then i go through the output file and count the colons in each line and write to the respective file based on number of ip's associated to the domains.

Code:
cat ?_servers_ips |sed 's/:/\n/g' |grep -v "^$|[[:alpha:]] |sort -u > master_iplist
cat ?_servers_ips > working_output

for I in {8..1}; do
     cat /dev/null > "working_"$i"_ips"
done

while read my IP
   do
       grep $myIP working_output > /dev/null
       if [ $? == 1 ]; then
           grep $myIP all_working_ips |tail -1 >> working_output
       fi
   done < master_iplist

while read testDomain
do 
   count=`grep -o ":" <<< $testDomain| wc -l`
   count=`bc <<< $count-1`
   if [ $count -ge 8 ]; then               #lump the couple of 8 or more into 1 file
        count=8
   fi
   printf "$testDomain\n" >> "working_"$i"_ips"
done < working_output

I'm sure there are ways to improve this.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Bash to remove find and remove specific extension

The bash below executes and does find all the .bam files in each R_2019 folder. However set -x shows that the .bam extension only gets removed from one .bam file in each folder (appears to be the last in each). Why is it not removing the extension from each (this is $SAMPLE)? Thank you :). set... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. Shell Programming and Scripting

awk to remove lines that do not start with digit and combine line or lines

I have been searching and trying to come up with an awk that will perform the following on a converted text file (original is a pdf). 1. Since the first two lines are (begin with) text they are removed 2. if $1 is a number then all text is merged (combined) into one line until the next... (3 Replies)
Discussion started by: cmccabe
3 Replies

3. Hardware

Bumblebee for CentOS5/RHEL5

I have a laptop running CentOS 5.9 with switchable/hybrid graphics (Intel integrated and Nvidia). For now, have managed to successfully install Nvidia driver using NVIDIA-Linux-x86_64-331.20.run file from Nvidia website, but failed to start xserver. To use both adapters I need to install... (0 Replies)
Discussion started by: cooltoad
0 Replies

4. Shell Programming and Scripting

Parsing XML (and insert data) then output data (bash / Solaris)

Hi folks I have a script I wrote that basically parses a bunch of config and xml files works out were to add in the new content then spits out the data into a new file. It all works - apart from the xml and config file format in the new file with XML files the original XML (that ends up in... (2 Replies)
Discussion started by: dfinch
2 Replies

5. Shell Programming and Scripting

Converting variable space width data into CSV data in bash

Hi All, I was wondering how I can convert each line in an input file where fields are separated by variable width spaces into a CSV file. Below is the scenario what I am looking for. My Input data in inputfile.txt 19 15657 15685 Sr2dReader 107.88 105.51... (4 Replies)
Discussion started by: vharsha
4 Replies

6. Shell Programming and Scripting

how to write a function to get data under spesific lines ? using bash

I have a text file called ( bvhz ) contains data : Subscriber Data ID = 2 Customer = 99 Data ID = 4 Customer = cf99 Data ID = 5 Customer = c99 Data ID = 11 Customer = 9n9 Subscriber Data ID = 1 Customer = 9ds9 Data ID = 2 Customer = 9sad9 Data ID = 3 Customer = f99... (1 Reply)
Discussion started by: teefa
1 Replies

7. Red Hat

Adding EML to CentOS5.5

Hi! I've just installed CentOS 5.5 on HP Proliant DL380G5 with two FC HBA and now I'm struggling with adding HP EML103e connected through two Brocade SAN switches. Tape library has two tape drives, each in different fabric. Robotics connected to both fabrics. Zoning is ok - OS sees drives'... (0 Replies)
Discussion started by: Sapfeer
0 Replies

8. Shell Programming and Scripting

Extracting specific lines of data from a file and related lines of data based on a grep value range?

Hi, I have one file, say file 1, that has data like below where 19900107 is the date, 19900107 12 144 129 0.7380047 19900108 12 168 129 0.3149017 19900109 12 192 129 3.2766666E-02 ... (3 Replies)
Discussion started by: Wynner
3 Replies

9. Shell Programming and Scripting

remove blank lines and merge lines in shell

Hi, I'm not a expert in shell programming, so i've come here to take help from u gurus. I'm trying to tailor a csv file that i got to make it work for the LOAD FROM command. I've a datatable csv of the below format - --in file format xx,xx,xx ,xx , , , , ,,xx, xxxx,, ,, xxx,... (11 Replies)
Discussion started by: dvah
11 Replies

10. Programming

pthread_mutex_trylock() overwrites global variable on CentOS5

Hi all, I am new to linux and got problem with pthread_mutex_trylock(). I have used mutex in my code. When I try to call pthread_mutex_trylock() on RECURSIVE type of mutex it overwrites adjacent memory location (that is global variable of type structure say x, memory allocated using malloc()). ... (5 Replies)
Discussion started by: liveshell
5 Replies
Login or Register to Ask a Question