Fielding Posted September 11, 2017 Share Posted September 11, 2017 I have a dictionary.com generated html file with words and meanings. Please If you show me how to convert to csv one or two or de words I will do the rest. I need the csv to import after with Anki and learn english. Best Regards html.html Link to comment Share on other sites More sharing options...
niche Posted September 11, 2017 Share Posted September 11, 2017 just put commas between the words. else, you'll need to use something like php if you have a lot of them. Link to comment Share on other sites More sharing options...
Fielding Posted September 12, 2017 Author Share Posted September 12, 2017 Please show in the attached file Best Regards Link to comment Share on other sites More sharing options...
niche Posted September 12, 2017 Share Posted September 12, 2017 Please post the relevant file content or a sample. Link to comment Share on other sites More sharing options...
davej Posted September 12, 2017 Share Posted September 12, 2017 Since the file contains a series of rather complex word definitions such as... <p><b><h1>shrugs </h1></b></p><div class="source-data"> <div class="def-list"> <section class="def-pbk ce-spot" data-collapse-expand='{"target": ".def-set", "type": "def"}'> <header class="luna-data-header"> <span class="dbox-pg">verb (used with object)</span>, <span class="dbox-bold">shrugged, </span><span class="dbox-bold" data-syllable="shrug·ging.">shrugging.</span> </header> <div class="def-set"> <span class="def-number">1.</span> <div class="def-content"> to raise and contract (the shoulders), expressing indifference, disdain, etc. </div> </div> </section> <section class="def-pbk ce-spot" data-collapse-expand='{"target": ".def-set", "type": "def"}'> <header class="luna-data-header"> <span class="dbox-pg">verb (used without object)</span>, <span class="dbox-bold">shrugged, </span><span class="dbox-bold" data-syllable="shrug·ging.">shrugging.</span> </header> <div class="def-set"> <span class="def-number">2.</span> <div class="def-content"> to raise and contract the shoulders. </div> </div> </section> <section class="def-pbk ce-spot" data-collapse-expand='{"target": ".def-set", "type": "def"}'> <header class="luna-data-header"> <span class="dbox-pg">noun</span> </header> <div class="def-set"> <span class="def-number">3.</span> <div class="def-content"> the movement of raising and contracting the shoulders. </div> </div> <div class="def-set"> <span class="def-number">4.</span> <div class="def-content"> a short sweater or jacket that ends above or at the waistline. </div> </div> </section> <section class="def-pbk ce-spot" data-collapse-expand='{"target": ".def-set", "type": "def"}'> <header class="luna-data-header"> <span class="dbox-pg">Verb phrases</span> </header> <div class="def-set"> <span class="def-number">5.</span> <div class="def-content"> <span class="dbox-bold">shrug off, </span> <ol class="def-sub-list"> <li> to disregard; minimize: <div class="def-block def-inline-example"><span class="dbox-example">to shrug off an insult.</span></div> </li> <li> to rid oneself of: <div class="def-block def-inline-example"><span class="dbox-example">to shrug off the effects of a drug.</span></div> </li> </ol> </div> </div> </section> </div> <div class="tail-wrapper"> ...are you saying you would like the above to be converted into... shrugs,verb (used with object),shrugged,shrugging.,1.,to raise and contract (the shoulders),expressing indifference, disdain, etc.,verb (used without object),shrugged,shrugging.,2.,to raise and contract the shoulders.,3.,the movement of raising and contracting the shoulders.,4.,a short sweater or jacket that ends above or at the waistline.,,Verb phrases,5.,shrug off, to disregard; minimize:,to shrug off an insult.,to rid oneself of:,to shrug off the effects of a drug. ...this introduces several problems. For one thing there are embedded commas in the text. Also each element is variable in length. Link to comment Share on other sites More sharing options...
Fielding Posted September 14, 2017 Author Share Posted September 14, 2017 Understood. It's a difficult question Thanks Best Regards Link to comment Share on other sites More sharing options...
dsonesuk Posted September 15, 2017 Share Posted September 15, 2017 html h1 cannot be within a p paragraph element or visa versa. If using php server script you could if you wrap each string of text in an specific element such as the hardly used '<b>...</b>', then replace these with double quotes, that will then cause the commas, and any quotes to be ignored, then use strip_tags to remove all html tags so quoted text remains, then separate those by comma delimiter. Link to comment Share on other sites More sharing options...
dsonesuk Posted September 15, 2017 Share Posted September 15, 2017 Example page; <!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta name="viewport" id="viewport" content="target-densitydpi=high-dpi,initial-scale=1.0" /> <title>Document Title</title> </head> <body> <h1><b>shrugsxxxxx</b> </h1> <div class="source-data"> <div class="def-list"> <section class="def-pbk ce-spot" data-collapse-expand='{"target": ".def-set", "type": "def"}'> <header class="luna-data-header"> <b> <span class="dbox-pg">verb (used with object)</span>, <span class="dbox-bold">shrugged, </span><span class="dbox-bold" data-syllable="shrug·ging.">shrugging.</span> </b> </header> <div class="def-set"> <b><span class="def-number">1.</span></b> <div class="def-content"> <b>to raise and contract (the shoulders), expressing indifference, disdain, etc. </b></div> </div> </section> <section class="def-pbk ce-spot" data-collapse-expand='{"target": ".def-set", "type": "def"}'> <header class="luna-data-header"> <b><span class="dbox-pg">verb (used without object)</span>, <span class="dbox-bold">shrugged, </span><span class="dbox-bold" data-syllable="shrug·ging.">shrugging.</span> </b> </header> <div class="def-set"> <b><span class="def-number">2.</span></b> <div class="def-content"> <b>to raise and contract the shoulders.</b> </div> </div> </section> <section class="def-pbk ce-spot" data-collapse-expand='{"target": ".def-set", "type": "def"}'> <header class="luna-data-header"> <b><span class="dbox-pg">noun</span> </b> </header> <div class="def-set"> <b><span class="def-number">3.</span></b> <div class="def-content"> <b>the movement of raising and contracting the shoulders.</b> </div> </div> <div class="def-set"> <b><span class="def-number">4.</span></b> <div class="def-content"> <b>a short sweater or jacket that ends above or at the waistline.</b> </div> </div> </section> <section class="def-pbk ce-spot" data-collapse-expand='{"target": ".def-set", "type": "def"}'> <header class="luna-data-header"> <b><span class="dbox-pg">Verb phrases</span> </b> </header> <div class="def-set"> <b><span class="def-number">5.</span></b> <div class="def-content"> <b><span class="dbox-bold">shrug off, </span></b> <ol class="def-sub-list"> <li> <b>to disregard; minimize:</b> <div class="def-block def-inline-example"> <b><span class="dbox-example">to shrug off an insult.</span></b></div> </li> <li> <b>to rid oneself of:</b> <div class="def-block def-inline-example"><b><span class="dbox-example">to shrug off the effects of a drug.</span></b></div> </li> </ol> </div> </div> </section> </div> <div class="tail-wrapper"> </div> </div> </body> </html> Code to remove tags, replace comma with encoded html alternative, and replace closing bold element with comma delimiter, while removing opening bold tag. (php or html) <?php header("Content-Type: text/plain"); $c = curl_init('http://localhost/web_testing/example.php'); curl_setopt($c, CURLOPT_RETURNTRANSFER, true); //curl_setopt(... other options you want...) $html = curl_exec($c); if (curl_error($c)) { die(curl_error($c)); } else { $html = strip_tags_content($html, '<title>', TRUE); $html = preg_replace('/\,+/', ',', $html); $html = preg_replace('/\<b>+/', '', $html); $html = preg_replace('/\<\/b>+/', ',', $html); $html = strip_tags($html, '<br>'); $html = preg_replace('/\s+/', ' ', $html); $html = rtrim($html); $html = rtrim($html, ','); //echo $html; $list[] = $html; $file = fopen("contacts.csv", "w"); foreach ($list as $line) { fputcsv($file, explode(',', $line)); } fclose($file); } // Get the status code $status = curl_getinfo($c, CURLINFO_HTTP_CODE); curl_close($c); function strip_tags_content($text, $tags = '', $invert = FALSE) { preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags); $tags = array_unique($tags[1]); if (is_array($tags) AND count($tags) > 0) { if ($invert == FALSE) { return preg_replace('@<(?!(?:' . implode('|', $tags) . ')\b)(\w+)\b.*?>.*?</\1>@si', '', $text); } else { return preg_replace('@<(' . implode('|', $tags) . ')\b.*?>.*?</\1>@si', '', $text); } } elseif ($invert == FALSE) { return preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text); } return $text; } contact.csv result " shrugsxxxxx"," verb (used with object), shrugged, shrugging. "," 1."," to raise and contract (the shoulders), expressing indifference, disdain, etc. "," verb (used without object), shrugged, shrugging. "," 2."," to raise and contract the shoulders."," noun "," 3."," the movement of raising and contracting the shoulders."," 4."," a short sweater or jacket that ends above or at the waistline."," Verb phrases "," 5."," shrug off, "," to disregard; minimize:"," to shrug off an insult."," to rid oneself of:"," to shrug off the effects of a drug." code for reading csv file <!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta name="viewport" id="viewport" content="target-densitydpi=high-dpi,initial-scale=1.0,user-scalable=no" /> <title>Document Title</title> </head> <body> <?php $file = fopen("contacts.csv", "r"); foreach (fgetcsv($file) as $f) { echo $f . '<br>'; } fclose($file); ?> </body> </html> It will need adjusting to compensate for other elements, maybe, but it works, as i opened it in Excel. Link to comment Share on other sites More sharing options...
Fielding Posted September 16, 2017 Author Share Posted September 16, 2017 Thanks dsonesuk- I will try this ! Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now