Using a regular expression to remove tags

ThePsion5 · June 21, 2006

Hi,I'm using the regular expressions in PHP to remove duplicate XML tags in a document, and i'm trying to use the following regular expression to do so:

(</?[^\W]*>)\1{1,1}

Now, this works with a test string like this one:

<good>sdjfkadsf</good><bad><bad>randomy stuff</good></doom>

Which it turns into this:

<good>sdjfkadsf</good>randomy stuff</good></doom>

but the same regular expression fails when I use it to test this longer string:

<df><block><block>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Pellentesque est. Donec facilisis, metus eget accumsan fermentum, dui diam eleifend sapien, vitae dictum nibh......Aliquam laoreet, purus sit amet adipiscing venenatis, purus nisl molestie ligula, id volutpat odio risus at eros. Integer nec purus. Proin rhoncus ornare orci. Etiam nonummy dictum elit. </df></df>

The regular expressions used to test them are identical in every way, does anyone have an idea of what might be going on? Thanks in advance!-Sean

ThePsion5 · June 21, 2006

I should rename this topic to "I am stupid"Test Code:

print htmlentities(preg_replace($tagRegExp, 'XXX', $currentTag));

My function code:

$tagRegExp = '~(</?[^\W]*>)\1{1,1}~';$String = preg_replace($tagRegExp, '', $String);return $String;

Anyone notice a slight inconsitency there? In conclusion - simple errors are embarassing.

Sign In

Using a regular expression to remove tags

Recommended Posts

ThePsion5

Link to comment

Share on other sites

ThePsion5

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity