Jump to content

Using a regular expression to remove tags


ThePsion5
 Share

Recommended Posts

Hi,I'm using the regular expressions in PHP to remove duplicate XML tags in a document, and i'm trying to use the following regular expression to do so:

(</?[^\W]*>)\1{1,1}

Now, this works with a test string like this one:

<good>sdjfkadsf</good><bad><bad>randomy stuff</good></doom>
Which it turns into this:
<good>sdjfkadsf</good>randomy stuff</good></doom>
but the same regular expression fails when I use it to test this longer string:
<df><block><block>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Pellentesque est. Donec facilisis, metus eget accumsan fermentum, dui diam eleifend sapien, vitae dictum nibh......Aliquam laoreet, purus sit amet adipiscing venenatis, purus nisl molestie ligula, id volutpat odio risus at eros. Integer nec purus. Proin rhoncus ornare orci. Etiam nonummy dictum elit. </df></df>
The regular expressions used to test them are identical in every way, does anyone have an idea of what might be going on? Thanks in advance!-Sean
Link to comment
Share on other sites

I should rename this topic to "I am stupid"Test Code:

print htmlentities(preg_replace($tagRegExp, 'XXX', $currentTag));

My function code:

$tagRegExp = '~(</?[^\W]*>)\1{1,1}~';$String = preg_replace($tagRegExp, '', $String);return $String;

Anyone notice a slight inconsitency there? In conclusion - simple errors are embarassing.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...