remove chunks of text from file


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting remove chunks of text from file
# 1  
remove chunks of text from file

All,

So, I have an ldif file that contains about 6500 users worth of data. Some users have a block of text I'd like to remove, while some don't.

Example (block of text in question is the block starting with "authAuthority: ;Kerberosv5"):

User with text block:

Code:
# username, users, example.com
dn: uid=username,cn=users,dc=example,dc=com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
objectClass: apple-user
objectClass: extensibleObject
objectClass: organizationalPerson
objectClass: top
objectClass: person
apple-generateduid: 53CA02D7-B116-4461-B220-E3FC0B15964A
apple-mcxflags:: PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPCFET0NUW
 VBFIHBsaXN0IFBVQkxJQyAiLS8vQXBwbGUgQ29tcHV0ZXIvL0RURCBQTElTVCAxLjAvL0VOIiAiaH
 R0cDovL3d3dy5hcHBsZS5jb20vRFREcy9Qcm9wZXJ0eUxpc3QtMS4wLmR0ZCI+CjxwbGlzdCB2ZXJ
 zaW9uPSIxLjAiPgo8ZGljdD4KCTxrZXk+c2ltdWx0YW5lb3VzX2xvZ2luX2VuYWJsZWQ8L2tleT4K
 CTx0cnVlLz4KPC9kaWN0Pgo8L3BsaXN0Pgo=
loginShell: /bin/bash
uidNumber: 20192
authAuthority: ;ApplePasswordServer;0x470bb9eb325f31c3000040ee00002257,1024 35
  1423486873699801821345071757674738484067280188359389504392445041998105914670
 84867869429532763785664902803450035110236201552277202539905523086333992178101
 54867353409493808376385021788117196022631658234104675864712197939496802664455
 87225827331332464303631278838001920713257416459820742251056515142078124405645
 79 root@example.com:123.456.789.111
authAuthority: ;Kerberosv5;0x470bb9eb325f31c3000040ee00002257;username@EXAMPLE.C
 OM;EXAMPLE.COM;1024 35 142348687369980182134507175767473848406728
 01883593895043924450419981059146708486786942953276378566490280345003511023620
 15522772025399055230863339921781015486735340949380837638502178811719602263165
 82341046758647121979394968026644558722582733133246430363127883800192071325741
 645982074225105651514207812440564579 root@example.com:123.456.789.111
userPassword:: KioqKioqKio=
uid: username
cn: Firstname Lastname
gidNumber: 1029
givenName: Firstname
sn: Lastname
apple-user-homeurl:: PGhvbWVfZGlyPjx1cmw+YWZwOi8vamRhdGExLnVvcmVnb24uZWR1L1VzZ
 XJzPC91cmw+PHBhdGg+c3R1cmNvPC9wYXRoPjwvaG9tZV9kaXI+
homeDirectory: /Network/Servers/example.com/Users/username
apple-user-homequota: 4294967296
mail: username@example.com

Now, one problem is, ldapsearch/ldapdump break up attributes at 76 characters. So, the block in question should be one line.

So I'm curious if there's an easy way to either A. remove the line breaks for the blocks of text (any line that starts with a " " should have the space removed, and should be on the line above. Though, one line starts with " " and only should have one " " removed then get put back with the previous line, or B. just to nuke the whole block of text that starts with "authAuthority: ;Kerberosv5" and ends with "example.com:123.456.789.111".

Anyone have any ideas? (btw, I realize that the line breaks aren't at exactly 76 anymore, since I had to sterilize the text for any personal info).

Last edited by staze; 07-30-2009 at 12:46 AM..
# 2  
can you post the input and the desired output as well? this way, you'll get response quickly
# 3  
So, the output would look like this...

Don't much care how we get there.

Code:
# username, users, example.com
dn: uid=username,cn=users,dc=example,dc=com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
objectClass: apple-user
objectClass: extensibleObject
objectClass: organizationalPerson
objectClass: top
objectClass: person
apple-generateduid: 53CA02D7-B116-4461-B220-E3FC0B15964A
apple-mcxflags:: PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPCFET0NUW
 VBFIHBsaXN0IFBVQkxJQyAiLS8vQXBwbGUgQ29tcHV0ZXIvL0RURCBQTElTVCAxLjAvL0VOIiAiaH
 R0cDovL3d3dy5hcHBsZS5jb20vRFREcy9Qcm9wZXJ0eUxpc3QtMS4wLmR0ZCI+CjxwbGlzdCB2ZXJ
 zaW9uPSIxLjAiPgo8ZGljdD4KCTxrZXk+c2ltdWx0YW5lb3VzX2xvZ2luX2VuYWJsZWQ8L2tleT4K
 CTx0cnVlLz4KPC9kaWN0Pgo8L3BsaXN0Pgo=
loginShell: /bin/bash
uidNumber: 20192
authAuthority: ;ApplePasswordServer;0x470bb9eb325f31c3000040ee00002257,1024 35
  1423486873699801821345071757674738484067280188359389504392445041998105914670
 84867869429532763785664902803450035110236201552277202539905523086333992178101
 54867353409493808376385021788117196022631658234104675864712197939496802664455
 87225827331332464303631278838001920713257416459820742251056515142078124405645
 79 root@example.com:123.456.789.111
userPassword:: KioqKioqKio=
uid: username
cn: Firstname Lastname
gidNumber: 1029
givenName: Firstname
sn: Lastname
apple-user-homeurl:: PGhvbWVfZGlyPjx1cmw+YWZwOi8vamRhdGExLnVvcmVnb24uZWR1L1VzZ
 XJzPC91cmw+PHBhdGg+c3R1cmNvPC9wYXRoPjwvaG9tZV9kaXI+
homeDirectory: /Network/Servers/example.com/Users/username
apple-user-homequota: 4294967296
mail: username@example.com

Notice the "authAuthority: ;Kerberosv5" section is gone.

Thanks!

Last edited by staze; 07-30-2009 at 12:46 AM..
# 4  
this will remove the one " " then get put back with the previous line..
Code:
sed -ne 'H
${
x
s/\n //g
p
}' filename

this will remove that chunk..
Code:
sed '/authAuthority: ;Kerberosv5/,/ root@example.com/d' filename

# 5  
Wow, great.

So, the first one to remove the line breaks works great. The second one, which removes the chunk, doesn't seem to work. The resulting output is significantly truncated (it seems to be removing a very large portion of the file).

Before the second command, the file is 181095 lines. After the sed command, it's 2652. Since there are 6411 users in the ldif file, theoretically, that sed command should only be removing one line per user, so the output should be about 175k lines.

sed should stop it's pattern match at the first "stop" it sees. Could that not be working?

For example...

Here's the output after taking my "input" and running it through the "de-space/de-return sed"
Code:
# username, users, example.com
dn: uid=username,cn=users,dc=example,dc=com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
objectClass: apple-user
objectClass: extensibleObject
objectClass: organizationalPerson
objectClass: top
objectClass: person
apple-generateduid: 53CA02D7-B116-4461-B220-E3FC0B15964A
apple-mcxflags:: PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPCFET0NUWVBFIHBsaXN0IFBVQkxJQyAiLS8vQXBwbGUgQ29tcHV0ZXIvL0RURCBQTElTVCAxLjAvL0VOIiAiaHR0cDovL3d3dy5hcHBsZS5jb20vRFREcy9Qcm9wZXJ0eUxpc3QtMS4wLmR0ZCI+CjxwbGlzdCB2ZXJzaW9uPSIxLjAiPgo8ZGljdD4KCTxrZXk+c2ltdWx0YW5lb3VzX2xvZ2luX2VuYWJsZWQ8L2tleT4KCTx0cnVlLz4KPC9kaWN0Pgo8L3BsaXN0Pgo=
loginShell: /bin/bash
uidNumber: 20192
authAuthority: ;ApplePasswordServer;0x470bb9eb325f31c3000040ee00002257,1024 35 142348687369980182134507175767473848406728018835938950439244504199810591467084867869429532763785664902803450035110236201552277202539905523086333992178101548673534094938083763850217881171960226316582341046758647121979394968026644558722582733133246430363127883800192071325741645982074225105651514207812440564579 root@example.com:123.456.789.111
authAuthority: ;Kerberosv5;0x470bb9eb325f31c3000040ee00002257;username@EXAMPLE.COM;EXAMPLE.COM;1024 35 142348687369980182134507175767473848406728018835938950439244504199810591467084867869429532763785664902803450035110236201552277202539905523086333992178101548673534094938083763850217881171960226316582341046758647121979394968026644558722582733133246430363127883800192071325741645982074225105651514207812440564579 root@example.com:123.456.789.111
userPassword:: KioqKioqKio=
uid: username
cn: Firstname Lastname
gidNumber: 1029
givenName: Firstname
sn: Lastname
apple-user-homeurl:: PGhvbWVfZGlyPjx1cmw+YWZwOi8vamRhdGExLnVvcmVnb24uZWR1L1VzZXJzPC91cmw+PHBhdGg+c3R1cmNvPC9wYXRoPjwvaG9tZV9kaXI+
homeDirectory: /Network/Servers/example.com/Users/username
apple-user-homequota: 4294967296
mail: username@example.com

Now here's what comes out after the sed to remove the kerberos block (`sed '/authAuthority: ;Kerberosv5/,/ root@example.com/d' test2.ldif`):
Code:
# username, users, example.com
dn: uid=username,cn=users,dc=example,dc=com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
objectClass: apple-user
objectClass: extensibleObject
objectClass: organizationalPerson
objectClass: top
objectClass: person
apple-generateduid: 53CA02D7-B116-4461-B220-E3FC0B15964A
apple-mcxflags:: PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPCFET0NUWVBFIHBsaXN0IFBVQkxJQyAiLS8vQXBwbGUgQ29tcHV0ZXIvL0RURCBQTElTVCAxLjAvL0VOIiAiaHR0cDovL3d3dy5hcHBsZS5jb20vRFREcy9Qcm9wZXJ0eUxpc3QtMS4wLmR0ZCI+CjxwbGlzdCB2ZXJzaW9uPSIxLjAiPgo8ZGljdD4KCTxrZXk+c2ltdWx0YW5lb3VzX2xvZ2luX2VuYWJsZWQ8L2tleT4KCTx0cnVlLz4KPC9kaWN0Pgo8L3BsaXN0Pgo=
loginShell: /bin/bash
uidNumber: 20192
authAuthority: ;ApplePasswordServer;0x470bb9eb325f31c3000040ee00002257,1024 35 142348687369980182134507175767473848406728018835938950439244504199810591467084867869429532763785664902803450035110236201552277202539905523086333992178101548673534094938083763850217881171960226316582341046758647121979394968026644558722582733133246430363127883800192071325741645982074225105651514207812440564579 root@example.com:123.456.789.111

So obviously, something is not quite right...

Thanks!

---------- Post updated 07-30-09 at 10:59 AM ---------- Previous update was 07-29-09 at 07:43 PM ----------

ah ha... okay, so after removing line breaks, the command should be:

Code:
sed '/authAuthority: ;Kerberosv5/d' filename

Obviously (now) because it's all one line, and sed stops at the first line break after the match.

Last edited by staze; 07-30-2009 at 12:53 AM..
# 6  
Follow up question...

So, now that that's working, here's another one.

Assuming I have "username", and "REALM", can anyone think of a good way to turn this:

Code:
 ;ApplePasswordServer;0x49e8c2c668dbcb0200004a090000342a,1024 35 142348687369980182134507175767473848406728018835938950439244504199810591467084867869429532763785664902803450035110236201552277202539905523086333992178101548673534094938083763850217881171960226316582341046758647121979394968026644558722582733133246430363127883800192071325741645982074225105651514207812440564579 root@ldap.example.com:123.456.789.111

Into this:

Code:
 ;Kerberosv5;0x49e8c2c668dbcb0200004a090000342a;username@REALM.EXAMPLE.COM;REALM.EXAMPLE.COM;1024 35 142348687369980182134507175767473848406728018835938950439244504199810591467084867869429532763785664902803450035110236201552277202539905523086333992178101548673534094938083763850217881171960226316582341046758647121979394968026644558722582733133246430363127883800192071325741645982074225105651514207812440564579 root@ldap.example.com:123.456.789.111

You'll notice that the hex value following the ;ApplePasswordServer; and ;Kerberosv5; are the same, as are the sections beginning with 1024 and ending with root@ldap.example.com:123.456.789.111. So basically, need so swap out ApplePasswordServer with Kerberosv5, and add the username and REALM info.

I can think of a few ways to do this with php, but I'm not particularly good with regex, and it would be nice to do this all in (ba)sh.

Thanks!

Last edited by staze; 08-04-2009 at 03:23 PM..
# 7  
Quote:
Originally Posted by staze
So, now that that's working, here's another one.
Instead of us doing your work for you, why not give it a shot yourself, run it, debug it, and come to us with specific questions?

Regards
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #74
Difficulty: Easy
NeXTStep was based on the original BSD operating system.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove the text between all curly brackets from text file?

Hello experts, I have a text file with lot of curly brackets (both opening { & closing } ). I need to delete them alongwith the text between opening & closing brackets' pair. For ex: Input:- 59. Rh1 Qe4 {(Qf5-e4 Qd8-g8+ Kg6-f5 Qg8-h7+ Kf5-e5 Qh7-e7+ Ke5-f5 Qe7-d7+ Qe4-e6 Qd7-h7+ Qe6-g6... (6 Replies)
Discussion started by: prvnrk
6 Replies

2. Shell Programming and Scripting

Deleting duplicated chunks in a file using awk/sed

Hi all, I'd always appreciate all helps from this site. I would like to delete duplicated chunks of strings on the same row(?). One chunk is comprised of four lines such as: path name starting point ending point voltage number I would like to delete duplicated chunks on the same... (5 Replies)
Discussion started by: jypark22
5 Replies

3. Shell Programming and Scripting

Cut text from a file and remove

Hello Friends, I am stuck with the below problem.Any help will be appreciated. I have a file which has say 100 lines. On the second last line I have a line from which i want to remove certain characters.. e.g CAST(CAST( A as varchar(50)) || ',' || CAST(CAST( B as varchar(50)) || ',' ||... (8 Replies)
Discussion started by: vital_parsley
8 Replies

4. Shell Programming and Scripting

Splitting a file into chunks of 1TB

Hi I have a file with different filesystems with there sizes. I need to split them in chucks of 1TB. The file looks like vf_MTLHQNASF07_Wkgp2 187428400 10601AW1 vf_MTLHQNASF07_Wkgp2 479504596 10604AW1 vf_MTLHQNASF07_Wkgp2 19940 10605AID vf_MTLHQNASF07_Wkgp2 1242622044... (4 Replies)
Discussion started by: bombcan
4 Replies

5. Shell Programming and Scripting

Reverse sort on delimited chunks within a file

Hello, I have a large file in which data of names is sorted according to their homographs. The database has the following structure:Each set of homographs with their corresponding equivalents in Devanagari is separated out from the next set by a hard return. An example will make this... (12 Replies)
Discussion started by: gimley
12 Replies

6. Shell Programming and Scripting

awk for splitting file in constant chunks

Hi gurus, I wanted to split main file in 20 files with 2500 lines in each file. My main file conatins total 2500*20 lines. Following awk I made, but it is breaking with error. awk '{ for (i = 1; i <= 20; i++) { starts=2500*$i-1; ends=2500*$i; NR>=starts && NR<=ends {f=My$i".txt"; print >> f;... (10 Replies)
Discussion started by: mukesh.lalwani
10 Replies

7. UNIX for Dummies Questions & Answers

Awk: Print out overlapping chunks of file - rows 0-20,10-30,20-40 etc.

First time poster, but the forum has saved my bacon more times than... Lots. Anyway, I have a text file, and wanted to use Awk (or any other sensible program) to print out overlapping sections, or arbitrary length. To describe by example, for file 1 2 3 4 5 etc... I want the out put... (3 Replies)
Discussion started by: matfald
3 Replies

8. Shell Programming and Scripting

Parsing chunks of text and finding data

Hi, I need a script that parses and greps data out of a textfile. I have a text file that has this structure: File1 host1.localdomain text random text Found errors this text is random (41123) --- random random at.5165 ---- random random at.5165 ---- random random at.5165 ----... (2 Replies)
Discussion started by: erick_tuk
2 Replies

9. Shell Programming and Scripting

Split file into chunks of low & high byte

Hi guys, i have a question about spliting a binary file into 2 chunks. First chunk with all high bytes and the second one with all low bytes. What unix tools can i use? And how can this be performed? I looked in manpages of split and dd but this does not help. Thanks (2 Replies)
Discussion started by: basta
2 Replies

10. Shell Programming and Scripting

remove specified text from file

I am trying to write a script that kills old sessions, I've posted here over the past few days and the script is just about perfect except I want to be given the option to exclude specified PIDs from being killed. this is the entire script: if then rm /tmp/idlepids fi if then rm... (2 Replies)
Discussion started by: raidzero
2 Replies

Featured Tech Videos