tantachar07 Posted November 14, 2006 Share Posted November 14, 2006 NEWS FLASH REPORTWhat is the hardest among the scripts?------------------------------------------- Vote will end at November 25, 2006I do not your GMT, but just follow the november 25 there . . . Link to comment Share on other sites More sharing options...
vchris Posted November 14, 2006 Share Posted November 14, 2006 What is the purpose of this thread? Link to comment Share on other sites More sharing options...
aspnetguy Posted November 14, 2006 Share Posted November 14, 2006 Not sure...he has posted it 4 times (I deleted the other 3) but has not made a poll. Link to comment Share on other sites More sharing options...
vchris Posted November 14, 2006 Share Posted November 14, 2006 and hardest script is pretty relative. I newcomer to javascript (I assume) would find all of them hard. Link to comment Share on other sites More sharing options...
aspnetguy Posted November 14, 2006 Share Posted November 14, 2006 While someone who has learned C++ would probably find most other languages easy.For a beginner with no programming experience I would say C or C++ would be the hardest. Link to comment Share on other sites More sharing options...
vchris Posted November 14, 2006 Share Posted November 14, 2006 I remember those days when I was seeing C++. It was wonderful Link to comment Share on other sites More sharing options...
aspnetguy Posted November 14, 2006 Share Posted November 14, 2006 I can now write console apps with my eyes closed...problem is who wants a console app these days :)I have been working on some Win32 and OpenGL but it is slow going. Link to comment Share on other sites More sharing options...
vchris Posted November 14, 2006 Share Posted November 14, 2006 Could you create a console app that takes a word webpage and cleans the code? I tried that in perl but it's a lot more complicated than I thought... Link to comment Share on other sites More sharing options...
aspnetguy Posted November 14, 2006 Share Posted November 14, 2006 theoretically yes, it would just be time consuming figuring out the string manipulation. Link to comment Share on other sites More sharing options...
vchris Posted November 14, 2006 Share Posted November 14, 2006 I guess I'll just stick to converting word docs to html myself as long as their not too big. Link to comment Share on other sites More sharing options...
aspnetguy Posted November 14, 2006 Share Posted November 14, 2006 can you give me a sample word doc you want to convert (preferably one that has most things you need to filter), I'll look at it and tell you whether I have the time to whip something up. Link to comment Share on other sites More sharing options...
vchris Posted November 14, 2006 Share Posted November 14, 2006 Here is a doc that needed to be cleaned, I ended up just copy pasting from word to DW.All I need is cleaned p tags <p> no styles, classes, or anything in there. Goes for any tags actually. no style tag in the head section no matter what is in there. No empty <p> tags (<p> </p>)I can try and search for the original word doc if you need it.Thanks! <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><meta name=Generator content="Microsoft Word 11 (filtered)"><title>PREFACE</title><style><!-- /* Font Definitions */ @font-face {font-family:"Times New \(W1\)"; panose-1:2 2 6 3 5 4 5 2 3 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0cm; margin-bottom:.0001pt; punctuation-wrap:simple; text-autospace:none; font-size:12.0pt; font-family:"Times New Roman";}h1 {margin-top:12.0pt; margin-right:0cm; margin-bottom:3.0pt; margin-left:21.6pt; text-indent:-21.6pt; page-break-after:avoid; punctuation-wrap:simple; text-autospace:none; font-size:16.0pt; font-family:Arial;}@page Section1 {size:612.0pt 792.0pt; margin:72.0pt 90.0pt 72.0pt 90.0pt;}div.Section1 {page:Section1;} /* List Definitions */ ol {margin-bottom:0cm;}ul {margin-bottom:0cm;}--></style></head><body lang=EN-CA><div class=Section1><h1 style='margin:0cm;margin-bottom:.0001pt;text-indent:0cm'><aname="_Toc138070756"></a><a name="_Toc95619545">PREFACE</a></h1><p class=MsoNormal style='margin-right:-22.5pt;text-align:justify'> </p><p class=MsoNormal style='margin-right:-22.5pt;text-align:justify'>The purposeof this guidebook is to provide information to government agencies, industryand industrial associations, non-governmental organizations and the publicabout the methods used for the compilation of area source emissions for the 2002National Emissions Inventory of Criteria Air Contaminants. The 2002 emissionsinventory was compiled by the provincial and territorial representatives of theEmissions and Projections Working Group of the Canadian Council of Ministers ofthe Environment and published in the summer of 2005. Relevant area sourceemission sectors are identified with sources of base quantity, emission factorsand other information used to derive emissions of TPM, PM<sub>10</sub>, PM<sub>2.5</sub>,SO<sub>x</sub>, NO<sub>x</sub>, VOCs and CO (and NH<sub><span style='font-family:"Times New \(W1\)"'>3</span></sub><span style='font-family:"Times New \(W1\)"'>,<sub></sub></span>where data were available) for the following major sourcecategories:</p><p class=MsoNormal style='margin-right:-22.5pt;text-align:justify'> </p><p class=MsoNormal style='margin-top:0cm;margin-right:-22.5pt;margin-bottom:0cm;margin-left:18.0pt;margin-bottom:.0001pt;text-align:justify;text-indent:-18.0pt'><span style='font-size:10.0pt;font-family:Symbol'>·<spanstyle='font:7.0pt "Times New Roman"'>        </span></span>industrial sector;</p><p class=MsoNormal style='margin-top:0cm;margin-right:-22.5pt;margin-bottom:0cm;margin-left:18.0pt;margin-bottom:.0001pt;text-align:justify;text-indent:-18.0pt'><span style='font-size:10.0pt;font-family:Symbol'>·<spanstyle='font:7.0pt "Times New Roman"'>        </span></span>non-industrial fuel combustion sector; </p><p class=MsoNormal style='margin-top:0cm;margin-right:-22.5pt;margin-bottom:0cm;margin-left:18.0pt;margin-bottom:.0001pt;text-align:justify;text-indent:-18.0pt'><span style='font-size:10.0pt;font-family:Symbol'>·<spanstyle='font:7.0pt "Times New Roman"'>        </span></span>transportation sector;</p><p class=MsoNormal style='margin-top:0cm;margin-right:-22.5pt;margin-bottom:0cm;margin-left:18.0pt;margin-bottom:.0001pt;text-align:justify;text-indent:-18.0pt'><span style='font-size:10.0pt;font-family:Symbol'>·<spanstyle='font:7.0pt "Times New Roman"'>        </span></span>incineration sector;</p><p class=MsoNormal style='margin-top:0cm;margin-right:-22.5pt;margin-bottom:0cm;margin-left:18.0pt;margin-bottom:.0001pt;text-align:justify;text-indent:-18.0pt'><span style='font-size:10.0pt;font-family:Symbol'>·<spanstyle='font:7.0pt "Times New Roman"'>        </span></span>miscellaneous source sector; and </p><p class=MsoNormal style='margin-top:0cm;margin-right:-22.5pt;margin-bottom:0cm;margin-left:18.0pt;margin-bottom:.0001pt;text-align:justify;text-indent:-18.0pt'><span style='font-size:10.0pt;font-family:Symbol'>·<spanstyle='font:7.0pt "Times New Roman"'>        </span></span>open source sector.</p><p class=MsoNormal style='margin-top:0cm;margin-right:-22.5pt;margin-bottom:0cm;margin-left:36.0pt;margin-bottom:.0001pt;text-align:justify'> </p><p class=MsoNormal style='margin-right:-22.5pt;text-align:justify'>Theguidebook is arranged using these categories, with sections corresponding tospecific contributing sectors within the above categories.</p><p class=MsoNormal style='margin-right:-22.5pt;text-align:justify'> </p><p class=MsoNormal style='margin-right:-22.5pt;text-align:justify'>Contributionsto this guidebook were prepared by scientists and engineers from the CriteriaAir Contaminants Section of the Pollution Data Division within EnvironmentCanada. The work was originally edited by Canadian ORTECH Environmental Inc.and prepared for publication by Graphics Plus+.</p><p class=MsoNormal> </p><p class=MsoNormal style='text-align:justify'><b><span lang=EN-US>Note toReader:</span></b></p><p class=MsoNormal style='text-align:justify'><b><span lang=EN-US> </span></b></p><p class=MsoNormal style='text-align:justify'><span lang=EN-US>The informationcontained in the guidebook may have been updated owing to more recentinformation that may have become available after the publication of thisdocument. </span></p><p class=MsoNormal style='text-align:justify'><span lang=EN-US> </span></p><p class=MsoNormal style='text-align:justify'><span lang=EN-US>Please see the‘Addendum’ for a description of any changes that were brought about to theguide since its publication.</span></p><p class=MsoNormal><span lang=EN-US> </span></p></div></body></html> Link to comment Share on other sites More sharing options...
aspnetguy Posted November 15, 2006 Share Posted November 15, 2006 ok here is what I have so far, it is not complete yet. It removes <style></style> and <meta> and catchs most tags with attributes in them but misses tags like this #include <windows.h>#include <iostream>#include <string>#include <stdio.h>using namespace std;string fileName;string outFile;string contents;string tmp;string strRemove;FILE * pFile;long lSize;char * buffer;int start;int end;int main(){ cout << "Enter path and file name to load: "; cin >> fileName; //cout << "Enter path and file name to create: "; //cin >> outFile; pFile = fopen(fileName.c_str(),"r"); if(pFile == NULL) { cout << "Could not open file! Exiting..." << endl << endl; system("pause"); exit(1); } //obtain file size. fseek(pFile , 0 , SEEK_END); lSize = ftell(pFile); rewind(pFile); //allocate memory to contain the whole file. buffer = (char*) malloc(lSize); if(buffer == NULL) { cout << "Error initializing buffer! Exiting..." << endl << endl; system("pause"); exit(2); } //copy the file into the buffer. fread(buffer,1,lSize,pFile); contents.assign(buffer); //remove <style></style> if(contents.find("<style") != string::npos) { start = contents.find("<style"); end = contents.find("</style>") + 8; strRemove = contents.substr(start,end-start); contents = contents.replace(contents.find(strRemove),strRemove.size(),""); } //clean remaining HTML tags tmp = contents; int loop = 0; int s; string oldTag; string newTag; while(tmp.find("<") != string::npos && loop < 100000) { start = tmp.find("<"); end = tmp.find(">"); tmp = tmp.replace(start,1,"["); tmp = tmp.replace(end,1,"]"); oldTag = tmp.substr(start,end-start+1); if(oldTag.find(" ") != string::npos) { s = oldTag.find(" "); newTag = oldTag.substr(0,s) + "]"; tmp = tmp.replace(tmp.find(oldTag),oldTag.size(),newTag); //cout << newTag << endl; } loop++; } while(tmp.find("[") != string::npos && loop < 100000) { start = tmp.find("["); end = tmp.find("]"); tmp = tmp.replace(start,1,"<"); tmp = tmp.replace(end,1,">"); loop++; } while(tmp.find("<meta>") != string::npos) { tmp = tmp.replace(tmp.find("<meta>"),6,""); } contents = tmp; cout << tmp << endl << endl; system("pause"); //close file and free memory fclose(pFile); free(buffer); return 0;} Link to comment Share on other sites More sharing options...
vchris Posted November 15, 2006 Share Posted November 15, 2006 Thanks a lot for even starting this project. You didn't have to do this.Nice work! Link to comment Share on other sites More sharing options...
Skemcin Posted November 15, 2006 Share Posted November 15, 2006 troll! Link to comment Share on other sites More sharing options...
Recommended Posts