Remove lines if some data is the same: Centos5 / bash

07-24-2012

Registered User

34, 1

Join Date: Oct 2007

Last Activity: 1 March 2018, 10:27 AM EST

Posts: 34

Thanks Given: 8

Thanked 1 Time in 1 Post

Remove lines if some data is the same: Centos5 / bash

Ok what i have is 8 separate files based on how many IP's are associated with the domain. I want to limit duplication of ip's within the individual files (ie. i don't care if the same ip is in 1 IP file and 4 IP file). Can someone help me with how to go through the files and remove lines that have have an IP already in a previous line (within that same file). In the 2 IP file the What and Where lines should be removed since each has an ip in the Who line. In the 4 IP file when the spock line is removed since it has an ip from Kirk that file is then ok since the McCoy line is then no longer duplicating 10.100.200.200. Hopefully this is understandable (it makes sense to me

). I have been able to get rid of domains that have all the same ip's associated by using sort -k? -k? -u to ignore the first field, but can't figure out how to do single ip's from a line and test against other lines

.

I'm doing this on a CentOS 5.8 box in a bash script (whole lot more processing going on all around this portion).

EXAMPLES: Colon separated lines in each file
domain.com:ip:ip:

1 IP file

Any.com:192.168.10.100:
Where.edu:192.168.10.200:

2 IP file

Who.com: 192.168.10.300:192.168.10.200:
What.gov:10.0.0.150:192.168.10.300:
Where.biz:192.168.10.200:10.10.0.10:
When.tv:192.168.10.10:192.168.10.11:

4 IP file

Kirk.ufp:10.0.100.100:10.0.200.100:10.0.200.200:10.0.100.200:
Spock.vsa:10.100.100.100:10.100.100.200:10.100.200.200:10.0.100.100
Mccoy.ama:10.100.200.200:192.168.200.200:192.168.100.200:192.168.100.201

oly_r

View Public Profile for oly_r

Find all posts by oly_r

07-24-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

What output do you want from this input?

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

07-24-2012

Registered User

34, 1

Join Date: Oct 2007

Last Activity: 1 March 2018, 10:27 AM EST

Posts: 34

Thanks Given: 8

Thanked 1 Time in 1 Post

The files end up having only one line that contains any one ip. It sounds confusing to me an i'm the one asking for help. If the IP is associated with a domain already i don't want it listed again.

1 IP file

Any.com:192.168.10.100:
Where.edu:192.168.10.200:

2 IP file

Who.com: 192.168.10.300:192.168.10.200:
What.gov:10.0.0.150:192.168.10.300:
Where.biz:192.168.10.200:10.10.0.10:
When.tv:192.168.10.10:192.168.10.11:

4 IP file

Kirk.ufp:10.0.100.100:10.0.200.100:10.0.200.200:10.0.100.200:
Spock.vsa:10.100.100.100:10.100.100.200:10.100.200.200:10.0.100.100
Mccoy.ama:10.100.200.200:192.168.200.200:192.168.100.200:192.168.100.201

oly_r

View Public Profile for oly_r

Find all posts by oly_r

07-25-2012

Registered User

34, 1

Join Date: Oct 2007

Last Activity: 1 March 2018, 10:27 AM EST

Posts: 34

Thanks Given: 8

Thanked 1 Time in 1 Post

Thanks anyway y'all. I decided to go another direction.

What i did was cat all the individual files together. Once with the complete lines included the other time just the ip's sorted and unique. Then i did a loop reading the ip checking it against my output file to make sure the ip hasn't already been logged. Then if it hasn't, grep the domain (first field) from the full listing file and tailed the last entry (lines with the most IP's associated to the domain). lather, rinse, repeat.

Then i go through the output file and count the colons in each line and write to the respective file based on number of ip's associated to the domains.

Code:

cat ?_servers_ips |sed 's/:/\n/g' |grep -v "^$|[[:alpha:]] |sort -u > master_iplist
cat ?_servers_ips > working_output

for I in {8..1}; do
     cat /dev/null > "working_"$i"_ips"
done

while read my IP
   do
       grep $myIP working_output > /dev/null
       if [ $? == 1 ]; then
           grep $myIP all_working_ips |tail -1 >> working_output
       fi
   done < master_iplist

while read testDomain
do 
   count=`grep -o ":" <<< $testDomain| wc -l`
   count=`bc <<< $count-1`
   if [ $count -ge 8 ]; then               #lump the couple of 8 or more into 1 file
        count=8
   fi
   printf "$testDomain\n" >> "working_"$i"_ips"
done < working_output

I'm sure there are ways to improve this.

oly_r

View Public Profile for oly_r

Find all posts by oly_r

Shell Programming and Scripting

Remove lines if some data is the same: Centos5 / bash

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Bash to remove find and remove specific extension

Discussion started by: cmccabe

2. Shell Programming and Scripting

awk to remove lines that do not start with digit and combine line or lines

Discussion started by: cmccabe

3. Hardware

Bumblebee for CentOS5/RHEL5

Discussion started by: cooltoad

4. Shell Programming and Scripting

Parsing XML (and insert data) then output data (bash / Solaris)

Discussion started by: dfinch

5. Shell Programming and Scripting

Converting variable space width data into CSV data in bash

Discussion started by: vharsha

6. Shell Programming and Scripting

how to write a function to get data under spesific lines ? using bash

Discussion started by: teefa

7. Red Hat

Adding EML to CentOS5.5

Discussion started by: Sapfeer

8. Shell Programming and Scripting

Extracting specific lines of data from a file and related lines of data based on a grep value range?

Discussion started by: Wynner

9. Shell Programming and Scripting

remove blank lines and merge lines in shell

Discussion started by: dvah

10. Programming

pthread_mutex_trylock() overwrites global variable on CentOS5

Discussion started by: liveshell