Jump to content
Greywacke

Splitting A String By "<br />"

Recommended Posts

hi,its me again with one of my weakpoints...regular expressions this time, i need to split a string of attributes: as a list of keys = variables, by splitting a string by "<br />".next the script will try to seperate the keys and values by " = ".the problem with this is that i cannot seem to split by html tags, as i have tried to do below.

arrt = arr[3].split(/<.*?>/g)[0];alert(arrt.join("\n"));var attr = new Array();for (var i = 0; i < arrt.length; i++) {	alert(arrt[i].split(" = "));	attr[i] = arrt[i].split(" = ");}

arr[3] is the attributes list retrieved from XML, the variable formatting differs accross sources such as the follows:source A

canopy_req = pre-owned_white<br/>canopy_style = low_or_highline_standard<br/>budget = R5,000 to R7,500<br/>fitment = within_month<br/>vehicle_status = possession_yes<br/>vehicle_make_model = Isuzu - DCAB<br/>year_model = 2006
source B
products_description = Nissan LWB Canopy - Half Door<br />Bakkie model = 2002 - current model<br />Requirement = Pre-owned - White<br />Colour code = White Non-coded (standard)<br />Budget = R3,000 to R4,000<br />Fitment = Within the next 2 weeks
this needs to be done in order to parse the attributes for required attributes (the key in this instance will contain "req" of various casing depending on the source of the lead.as i am not that confident with regular expressions, i will need to ask for help.there are a few ways to skin this cat, i could search for the attributes in the string, but then that would be hardcoding and apply only to these two sources. an array with subarrays allows me to check the variables for case-insensitive "req" to find the required attribute.any help would be appreciated - even just a nudge in the right direction :) Edited by Pierre 'Greywacke' du Toit

Share this post


Link to post
Share on other sites

Why not just this:arrt = arr[3].split("<br />")[0];orarrt = arr[3].split(/<br \/>/g)[0];Or to do what I think you were trying to do (split on any HTML tag):arrt = arr[3].split(/<.+>/g)[0];

Share this post


Link to post
Share on other sites

thanks jkloth,yes - i was trying to split on any HTML tag, due to the differences in the delimiters... :)gonna try your version now :)=======================okay, tried and a few changes have occured to the code.

var opt1 = document.createElement('option');arrt = arr[3].split(/<.+>/g);alert(arrt.join("\n"));var attr = new Array();for (var i = 0; i < arrt.length; i++) {	alert(arrt[i].split(" = "));	attr[i] = arrt[i].split(" = ")[0];}

thankyou for helping out with that regular expression :)

Edited by Pierre 'Greywacke' du Toit

Share this post


Link to post
Share on other sites

hmmm... seems i have found a temporary solution to my problem, but this arises another error that i can't figure out for the life of me...(oh the woes of working under a galvanised iron roof without aircon!)here is the updated piece of code:

//alert(arr[3]);var opt1 = document.createElement('option');arrt = arr[3].split(/<.+>/g);//alert(arrt);var attr = new Array();for (var i = 0; i < arrt.length; i++) {	attr[i] = arrt[i].split(" = ")[0];}var prod = "";var cstl = "";var preq = "";for (var i = 0; i < attr.length; i++) {	if (attr[i][0].toLowerCase().indexOf("req") != -1} {		preq = attr[i][1];	} else if (attr[i][0].toLowerCase().indexOf("prod") != -1 || attr[i][0].toLowerCase().indexOf("make") != -1) {		prod = attr[i][1];	} else if (attr[i][0].toLowerCase().indexOf("make") != -1) {		cstl = attr[i][1];	}}opt1.text = "\u00a0\u00a0\u00a0#" + (parseInt(arr[0]) + 9001100) + "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0" +		arr[6] + "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0" +		prod + cstl + " ( " + preq + " )";

unfortunately, this generates the following error in firefox...

Error: missing ) after conditionSource File: http://www.ferrety.co.za/fab/scripts/ajax_prospects.jsLine: 607, Column: 53Source Code: if (attr[0].toLowerCase().indexOf("req") != -1} {
please help! i am battling to see the forest for the trees...oh please do not click the link, i think it requires a password. the relevant code is above. Edited by Pierre 'Greywacke' du Toit

Share this post


Link to post
Share on other sites

oh nm, theres a curly closing brace instead of a normal closing brace...the problem now is that this doesn't retrieve the vehicle_make_model & canopy_style, canopy_req or products_description, Requirement attributes... 0o what could possibly be wrong here? 0o

Edited by Pierre 'Greywacke' du Toit

Share this post


Link to post
Share on other sites

What do you mean, the preq, prod, and cstl variables are empty? You can use console.log to send the whole attr array to Firebug so that you can see the data there, it might not be what you expect.

Share this post


Link to post
Share on other sites

the array preperation code has been changed now, i discovered i was only populating the attr array with element 0 from the split (*doh* :)).this has almost been completed (kind of been resolved), here is the updated code:

alert(arr[3]);var opt1 = document.createElement('option');arrt = arr[3].split(/<.+>/g);//alert(arrt);var attr = new Array();for (var i = 0; i < arrt.length; i++) {	attr[i] = arrt[i].split(" = ");	alert(attr[i].join("\n"));}var produ = "";	// product namevar pstyl = "";	// product stylevar prequ = "";	// product requirementfor (var n = 0; n < attr.length; n++) {	alert(attr[n][0].toLowerCase()+"\n"+attr[n][0].toLowerCase().indexOf("req")+	"\n"+attr[n][0].toLowerCase().indexOf("prod")+	"\n"+attr[n][0].toLowerCase().indexOf("make")+	"\n"+attr[n][0].toLowerCase().indexOf("styl"));	if (attr[n][0].toLowerCase().indexOf("req") != -1) {		prequ = attr[n][1];	} else if (attr[n][0].toLowerCase().indexOf("prod") != -1 ||		   attr[n][0].toLowerCase().indexOf("make") != -1) {		produ = attr[n][1];	} else if (attr[n][0].toLowerCase().indexOf("styl") != -1) {		pstyl = attr[n][1];	}}opt1.text = "\u00a0\u00a0\u00a0#" + (parseInt(arr[0]) + 9001100) + "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0" +arr[6] + "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0" +produ.concat(pstyl) + " ( " + prequ + " )";

ps justsomeguy, if i alert the data that its processing - that surely shows whats there? (especially when regarding strings) for some reason its not parsing all the attributes though... for some reason, it's only parsing the first two or so elements of the array which theoretically includes two subelements, the attribute key and value.

Edited by Pierre 'Greywacke' du Toit

Share this post


Link to post
Share on other sites

the problem is with the split(/<.+>/g) though... before that i have all the attributes, it only returns the first two which are not always the attributes wanted.well i did some checking, and /<.+>/g does equal any html tag - the . = any character, the plus means any occurrence count of the preceding character.perhaps the / in the br's (xhtml line break as opposed to <br>) are breaking the regular expression, i don't know... it could even be the >.

Edited by Pierre 'Greywacke' du Toit

Share this post


Link to post
Share on other sites

Alert only shows scalar data, if you send an array to alert it just shows "Array". But if you send it to Firebug's console you'll actually be able to look at all of the values in it. It's much easier to debug when you can track the actual data instead of making assumptions about it though. You could write a loop to go through the array and alert everything, but then you see every element only once and you have to click through all of the boxes, it's much easier to inspect the array directly in the DOM through the console.

Share this post


Link to post
Share on other sites

in firefox it shows the array elements, joined with a , though. the current site is an admin site for use in firefox. i can make out scalar, because i know what the data type is of the variable alerted. :)i did click the alerts one at a time, stepping through the elements in the array via the script. i've installed firebug now as you suggested, but i don't see how this changes what i already know beyond a doubt...

Edited by Pierre 'Greywacke' du Toit

Share this post


Link to post
Share on other sites

So what's the before and after? What's the string you're splitting, and what array does that result in?

Share this post


Link to post
Share on other sites
products_description = Tata D/C Canopy - Half Door<br />Bakkie model = 2004 - current model<br />Requirement = New - White<br />Colour code = White Non-coded (standard)<br />Budget = R3,000 to R4,000<br />Fitment = ASAP
is the string that is split first.this split value, equals the following two arrays (first and last attributes in the string), elements on a new line. it is looping through the entire split.
products_descriptionTata D/C Canopy - Half Door
FitmentASAP
then it loops through the 'saved arrays", validating them to find if it is a needed attribute.
products_description-10-1-1
and this is the second element.
fitment-1-1-1-1
what happened to the other elements? 0o Edited by Pierre 'Greywacke' du Toit

Share this post


Link to post
Share on other sites

this is the current script:

// ONLY PARSING THE FIRST TWO ATTRIBUTES... WTF?alert(arr[3]);var opt1 = document.createElement('option');arrt = arr[3].split(/<.+\/>/g,-1);//alert(arrt);var attr = new Array();for (var i = 0; i < arrt.length; i++) {	attr[i] = arrt[i].split(/.=./g,-1);	alert(attr[i].join("\n"));}var produ = "";	// product namevar pstyl = "";	// product stylevar prequ = "";	// product requirementfor (var n = 0; n < attr.length; n++) {	alert(attr[n][0].toLowerCase()+"\n"+attr[n][0].toLowerCase().indexOf("req")+	"\n"+attr[n][0].toLowerCase().indexOf("prod")+	"\n"+attr[n][0].toLowerCase().indexOf("make")+	"\n"+attr[n][0].toLowerCase().indexOf("styl"));	if (attr[n][0].toLowerCase().indexOf("req") != -1) {		prequ = attr[n][1];	} else if (attr[n][0].toLowerCase().indexOf("prod") != -1 ||	attr[n][0].toLowerCase().indexOf("make") != -1) {		produ = attr[n][1];	} else if (attr[n][0].toLowerCase().indexOf("styl") != -1) {		pstyl = attr[n][1];	}}opt1.text = "\u00a0\u00a0\u00a0#" + (parseInt(arr[0]) + 9001100) + "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0" +arr[6] + "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0" +produ.concat(pstyl) + " ( " + prequ + " )";// ONLY PARSING THE FIRST TWO ATTRIBUTES... WTF?

how would i be able to select a possible whitespace, such as a space, nobreak space, tab or carriage return or even a new line or none at all in two positions. these are indicated by the X's in the following regexp:/X<.+X\/>X/g

Edited by Pierre 'Greywacke' du Toit

Share this post


Link to post
Share on other sites

This is a "greedy" match. This pattern:/<.+>/gis matching this:products_description = Tata D/C Canopy - Half Door<br />Bakkie model = 2004 - current model<br />Requirement = New - White<br />Colour code = White Non-coded (standard)<br />Budget = R3,000 to R4,000<br />Fitment = ASAPGreedy means that it matches as much as it possibly can, ungreedy means it matches as little as it possibly can. You can actually just modify your pattern so that it looks for "<", then any character that is not ">", then ">"./<[^>]+>/gI think that should work, I'm pretty sure that's telling it to find "<", then one or more characters that are not ">", followed by ">".

Share this post


Link to post
Share on other sites

thanks justsomeguy :) i'll try this right away :)AWESOME :) as i said earlier there where many ways to skin a cat :) this seems the most straight forward however ;)

Edited by Pierre 'Greywacke' du Toit

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×