Jump to content

URGENT : REGEXP


Guest cute_chu

Recommended Posts

Guest cute_chu

Hi, can anybody help me to explain wat the following regular expression and it's replacement mean?var pattern=new RegExp("(^|\\t|,)(\"*|'*)(.*?)\\2(?=,|\\t|$)","g");...s=s.replace(pattern,"$3\t");....

Link to comment
Share on other sites

Hmm, I can help a little, but I can't tell you exactly what that regular expression does.The pipe character - | - is for ORs and the parentheses define groups.The first group - (^|\\t|,) - would appear to make it so that the string is only matched if it either is at the beginning of the line (^), starts with an escaped tab character (\t), or a comma. The double backslash - \\ - is used to match a single backslash, so \\t would match "\t" rather than an actual tab character.The second group - (\"*|'*) - matches an asterisk preceded by a double or single quote. "* or '* would be matched.The third group - (.*?) - matches a dot (.) zero or more times. Don't ask me to define the difference between a "Greedy" and "Lazy" lookup. All I know is that this one is "Lazy".The next part - \\2 - I can't quite explain. The only thing I can think of is that it is trying to match a backslash followed by a 2 - "\2". If it were "\2" rather than "\\2", it might be used for a substitution.The last group - (?=,|\\t|$). This is a look-ahead. This looks ahead in the string and only returns a match if whatever preceded it ends in a comma, a \t (again, not a tab character, but a backslash followed by a t), or the end of the string ($).Finally, it appears that the s.replace(pattern, "$3\t") will replace a dot (or series of dots) with a tab character when a match is found. $3 refers to the third group - (.*?).That's about all the help I can offer. Check out this site for more info:http://www.regular-expressions.info/

Link to comment
Share on other sites

I think I can add a little more to that.About this:

The second group - (\"*|'*) - matches an asterisk preceded by a double or single quote. "* or '* would be matched.
An asterisk is a meta-character, for zero or more times (question mark is zero or one time). So, the second group is matching any amount of either single or double quotes (but not both).Then, about this:
The third group - (.*?) - matches a dot (.) zero or more times. Don't ask me to define the difference between a "Greedy" and "Lazy" lookup. All I know is that this one is "Lazy".
A dot represents any character, and the *? is a "lazy zero or more". Lazy basically means that it matches as few as possible, greedy matches as much as possible. So, this group matches any sequence of characters, up until the next group, which is given as \\2. This is a backreference to the second group, which is the series of quotes. It corresponds to the same string that matched the second group. So, these together:(\"*|'*)(.*?)\\2Will match any number of the same quote character (0 or more), followed by any sequence of characters (including 0 characters), followed by the same sequence of quotes. So, these strings would match:
"""a d djeje 2 2j"""'name''''''0'''''

But these would not:

'"---'""''' 0 "'''

Also, the empty string would also match, since it looks for 0 or more of everything.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...