Click to See Complete Forum and Search --> : Striping down someone elses html using javascript and regex


shaneH
06-03-2007, 03:09 AM
Ok, I may be in over my head but if you are willing to read this lengthy post and help, thanks.

If you read my post from about a week ago, you know I have a real pain in the neck situation.

I have spent the last week getting my javascripting knowledge up to par and now need some help figuring out hopefully the last part of this site I am working on.

I am attempting to take the dynamically included content on this site I am working on, and strip it down to some more basic components and then reassemble it using my own styles and formating.

Below is the code I am using, first is one of the functions that are called in the second piece of code, also included below.


function ServerStatusMenu(){
document.write ("<style type='text/css'>");
document.write ("A \{font-size: 12px; color: #000000; text-decoration: none; \}");
document.write (".Menu \{position: absolute; left: 5px; top: 5px; \}");
document.write ("</style>");

document.write ("<table border='0' class='Menu' cellpadding='0' cellspacing='0'>");
document.write ("<tr>");
document.write ("<td class='MenuCat' align='center'>" + MenusTitle + "</td>");
document.write (StartContent);
document.write ("<td class='MenuChoice' align='left'>" + NewContent + "</td>");
document.write (EndContent);
document.write ("</tr>");
document.write ("</table>");
}



<script type="text/javascript" language="javascript">
var MenusTitle = "<!-- Menu:Title -->";
var change = MenusTitle.split(":");
var loggedin = (change[0]);

var NewContent = "<!-- Menu:StandAloneContent -->".replace(/\r/gm, " ");
var StartContent = "<!-- Menu:StartStandAlone -->";
var EndContent = "<!-- Menu:EndStandAlone -->";

var StartLists = "<!-- Menu:StartItemList -->";
var EndLists = "<!-- Menu:StartItemList -->";

var StartItems = "<!-- Menu:StartItem -->";
var EndItems = "<!-- Menu:EndItem -->";

var ItemsLink = "<!-- Menu:Link -->";
var ItemsPopup = "<!--Menu:Popup -->";
var LinksDesription = "<!-- Menu:LinkName -->";


if (MenusTitle == "Logged Out"){
LoggedOutMenu()
}
else if (loggedin == "Logged in"){
LoggedInMenu()
}
else if (MenusTitle == "Server Status"){
ServerStatusMenu()
}
else {
AllOtherMenus()
}
</script>


Now just in case you didn't know the <!-- Menu:Title --> are how the originall designer/host of the site is dynamically including content. And further I do not have access to the original files or the database nor can I use PHP.

Now below I have included what the code above looks like once it has been run on the site with the dynamic content included.


<script type="text/javascript" language="javascript">
var MenusTitle = "Logged In: Kriknosnah";
var change = MenusTitle.split(":");
var loggedin = (change[0]);


var NewContent = " <center style="font-size:8pt">
<i>Class Officer</i>
<a href='javascript:enablerankchallenge()' style='font-size:7.5pt;font-weight:normal'>Wrong?</a><br>
<A href="editprofile.php">Edit Your Account Info</a><br>
<A href="viewprofile.php?loginid=15">View your Account Profile</a>
<br><a href="myavailability.php">Edit your Availability for Raids</a>


</center>
<center style="font-size:8pt;font-weight:bold">My Character Profiles:</center>
<table cellspacing=0 cellpadding=0 border=0 align=center><tr><td class=plain style="font-size:7pt">
<a href='memberprofile.php?memberid=33'>Kriknosnah(70)</a><br><a href='memberprofile.php?memberid=42'>Zithril(50)</a><br> </td></tr></table> <center>
<a id=createchartext href="javascript:enablecreatechar(1)">[Create a Character]</A>
<div id=createcharform style="display:none">
<center>
Name:
<input type=text name=createname id=createname class=xsmall name=name value="" size=8>
<input type=button class=button value="Go" style="font-size:7pt" onClick="createchar()"><br>
<a href="javascript:enablecreatechar(0)">[Cancel]</a>
</center>
</div>
</center>
<center><a href="logout.php">Log out</a></center>
".replace(/\r /gm, " ");
var StartContent = "";
var EndContent = "";

var StartLists = "";
var EndLists = "";

var StartItems = "<!-- Menu:StartItem -->";
var EndItems = "<!-- Menu:EndItem -->";

var ItemsLink = "<!-- Menu:Link -->";
var ItemsPopup = "<!--Menu:Popup -->";
var LinksDesription = "<!-- Menu:LinkName -->";


if (MenusTitle == "Logged Out"){
LoggedOutMenu()
}
else if (loggedin == "Logged in"){
LoggedInMenu()
}
else if (MenusTitle == "Server Status"){
ServerStatusMenu()
}
else {
AllOtherMenus()
}
</script>


If you look at the "NewContent" variable you will probably guess my problem. I can easily change any part of the dynamically included content, using regex. But the problem is the code that the designer has is non-uniform in his techniques as in one place he will use double quotes (") and the next he will use single quotes (') and yet in a different spot he will use no quotes. So when I try to store his code in a variable the quotes I need to have around it are being overriden by the quotes in his code. Making it so I can not even get to the point that my code can start taking a part his code to make it look and work the way I want it to.

I am interested in finding another way to do this to get around the problem.

Also any sugestions on doing this a better way overall are welcome.

JPnyc
06-03-2007, 03:07 PM
I'm not sure I understand all the restrictions you're working under, but can't you replace his script with yours?

shaneH
06-03-2007, 04:59 PM
Let me try to explain what I have posted above a little better.

The first 2 segments of code that I posted are both writen by myself the third is the second segment after it has been run on the site. It is there just for a visuall example of the problem I am having with quotes (" ")

To try and give you a better idea of what is happening, here is the original code that those first 2 segments will be replacing


<table border="0" class="Menu" cellpadding="1" cellspacing="0" width="167">
<tr>
<td class="MenuCat" align="center"><!-- Menu:Title --></td>
</tr>
<!-- Menu:StartItemList -->
<tr>
<td class="MenuChoice" align="left">
<!-- Menu:StartItem -->
<li><a href="<!-- Menu:Link -->" <!--Menu:Popup -->><!-- Menu:LinkName --></a></li>
<!-- Menu:EndItem -->
</td>
</tr>
<!-- Menu:EndItemList -->

<!-- Menu:StartStandAlone -->
<tr>
<td class="MenuChoice" align="left">
<!-- Menu:StandAloneContent -->
</td>
</tr>
<!-- Menu:EndStandAlone -->
</table>


The purpose of the above code is to supply very basic formating for the dynamically included content (<!-- Menu:StandAloneContent -->). The above code is dynamically included in the sites main page (<!-- System:Menu:Xxxxxx -->). Its purpose is to display all the menus for the site. Some of the menus are just a list of links others are blocks of information that are retrieved from a database (I think) and displayed in the menu column and some of the data can change as often as every minute or so. The first 2 segments of code in my previous post will eventually replace the above code.

My problem is some of the dynamically included content has formatting, that is also included with it, that will not work with my vision of the new sites design. So I am trying to get each piece of dynamic content into different varables (or something) so that I can use regex and some other javascript to strip it down to it's more basic parts (ie. second segment of code in previous post). And then using the first segment of code I will create the formating for each of the new variables for a given menu.

Now if you look at the "var NewContent =" in the third segement of code (from my previous post) you will notice that the value of "NewContent" has several double (") and single (') quotes, with in it, which makes it so that I get errors when trying to store it in a variable. I need to know if there is a way to over come them. (bold to highlight the basic question)

Now my code in the previous post is not the final draft, as there is a lot more that is needed for processing the dynamic content and for reformating it but, I am hoping that it will be the basic components to process the dynamic content. Any suggestions to improve it are welcome.

JPnyc
06-03-2007, 07:16 PM
You're right, this is a dinger of a problem. I'm afraid I don't have a ready solution for you. I'll give it more thought. It would be easy if the " and ' weren't mixed in a single line but apparently they are.

shaneH
06-03-2007, 11:07 PM
Yea you aren't the first person I stumped with that. Found out a freind's son does web design for a living and after he got over how the site was put together he was lost on what to do also.

I have been searching through code examples, books, and other forums for anything that would give me a clue in what direction I could go, and I have been drawing a blank.

Oh, how I long for a MySQL database and PHP. *sigh*

cgraz
06-04-2007, 12:25 AM
is var = NewContent dynamically created? Say in PHP? Or is this all hardcoded in the individual files?

Could you str_replace() all ' with ", then surround the variable NewContent in single quotes? You'd then just have the issue of multi-lines within your variable definition, but you could replace all newlines to resolve that problem.

shaneH
06-04-2007, 03:55 AM
Yea, cgraz that would be nice, but I am not able to use php.

In
var NewContent = "<!-- Menu:StandAloneContent -->"

the <!-- Menu:StandAloneContent --> is the dynamically generated content. Yes it is a strange way to do it. I am assuming that the sites designer/host is having to parse the entire page to find each occurance of those types of comments but that is what is being done and I have no access to any of the sites underlying code or files that are being used so I can not call the dynamic content myself.

I have taken some of the dynamic code such as the menus that are just links to other pages on the site and hard coded them in the format that I need but these last 7 or so "menus" (that are actually just small boxes of varying content) are what I have to get at so as to reformat them to work with the newer sites layout and design.

Hmm...

Forums have to deal with stoping hackers from posting malicious code into the sites database. Now you usually use PHP for that but my though is, it is coming from a form, and you process the values of $_POST, and javascript can process forms also. Can javascript get values into the $_POST and then strip it out from there? I'm not sure that question makes sense, might have to think a little more to formulate a more intelligent question. But maybe it will get someone else thinking.

Weedpacket
06-04-2007, 07:18 AM
Can javascript get values into the $_POST and then strip it out from there?Yes, but if it's to prevent malicious stuff being posted then there is no point; for the Javascript to do its job you have to rely on the good faith of your site's users to use it.

Me, I'm still playing catchup with your problem. You have a page that is expected to have these comments in, right? And some script over which you have no control takes your page, and replaces those comments with content over which you have no control.

I really don't think you'll be able to embed it straight into Javascript without being able to preoprocess it first - as done by that script over which you have no control.

But there is another way of getting text into Javascript; it's an egregious hack but this whole system is one already.

<div id="StandAloneContentHolder" style="display:none"><!-- Menu:StandAloneContent --></div>
and then in the javascript

var standAloneContent = document.getElementById('StandAloneContentHolder');

At this point I guess you could use the innerHTML property to get raw HTML that you can throw regular expressions and whatnot at, or you can leave it as a DOM object and manipulate it at the element-and-attribute level.

Personally, I feel the latter would probably be more reliable since all the faffing about with single-/double-/un-quoted attribute values, empty elements, escaped quotes inside quoted strings and all that would not be an issue since you'd already be dealing with a parsed document fragment. But then, that does assume that the embedded HTML is consistent enough for you to be able to find the items you're looking for when programmatically dealing with the parsed structure.

shaneH
06-04-2007, 05:32 PM
Me, I'm still playing catchup with your problem. You have a page that is expected to have these comments in, right? And some script over which you have no control takes your page, and replaces those comments with content over which you have no control.

I really don't think you'll be able to embed it straight into Javascript without being able to preoprocess it first - as done by that script over which you have no control.


LOL

Weedpacket, I don't no why I found that to be funny but I couldn't stop laughing after I read it. Maybe this code is driving me crazy, who knows.

Well, it appears that last night I must have completed the picture for everyone, that I had asked for help from, and what you suggested, Weedpacket, is exactly what now serveral others have suggested also. I will now give it a shot and let everyone know how it works out.

Thanks

shaneH
06-07-2007, 04:35 AM
Just so everyone knows weedpacket's suggestion worked beautifullly and I am now well on my way to making the site look like someone payed thousands for it. :D

I do have one miner issue in that I think I have small problem with my code


if (MenusTitle == "Logged Out"){
LoggedOutMenu()
}
else if (loggedin == "Logged in"){
LoggedInMenu()
}
else if (MenusTitle == "Server Status"){
ServerStatusMenu()
}
else {
AllOtherMenus()
}



For some reason the above part of my code always calls the last function in the "else { }". Like for the Login menu even though the values equal (yes had the site show me the value) it still will use the function in the "else". I am sure it isn't the functions because I switched them around and they all worked if they were in the "else". I suspect that this is something fundamental and as my knowledge of javascript is still growing, I can't seem to figure it out. Anyone have any sugestions.

PS. an example of one of my functions is in my first post here if someone needs to see them.

shaneH
06-07-2007, 09:18 PM
doh!!!

Nevermind, spelling error