Thursday, March 30, 2006

Autosave vs. Persistent State

So the goal is to have a SQL based state. A tag in a hidden input field registered by the base page will be the unique id to get this state. If it is lost it can be recovered via this tag. So loss of the session has essentially no effect (besides causing a reload from the database).

But in the meantime I feel like I need to have a solution to saving. What I have currently built is an autosave mechanism, that will write out autosave files when your session times out. The question is, is this enough? Probably. It has the same format as a TOC file and the filename can be rewritten and picked up by BBDataManager. Of course, the drawback is that the user has to ask for this.

My original vision (that I have moved beyond) was to allow this autosave to be accessible from the history page. Maybe I should look into that. But it is partially a management decision as to whether the current autosaving mechanism will be sufficient to appease Pearson.

Wednesday, March 22, 2006

Change of direction

Ok, got convinced that this would be hard. Talked to the boss and he seemed to agree with my temporary strategy of doing a simple autosave and saving this stuff till the back end resdesign.

It's funny. I am so worn out that I can't help but sound sarcastic. I was trying to tell him that I was looking forward to redesigning the backend but I sounded really sarcastic even thought I really am looking forward to the challenge.

So why is what I was trying to do hard? One was a flawed approach on my part. I tried to take a series of tables that may or may not exist for an order and turn them into a big, flat session state. Unfortunately that still seems to make a lot of sense to me. I am slowly working towards having a set of tables in the data set. Yeah, 5 tables.

I think the basic problem is that you would need to change the database schema (albeit not in vast ways) and write numerous stored procedures. It would take some time and I want to get this stuff pushed forward and start to work on the back end.

Focussed thoughts on new state

Ok, I have had some time to think about how things will now work. I am going to add 3 objects. One is the DataObject. This holds all the state and is essentially a wrapper for a DataSet. This objects will act as an intermediary between the site and the database. The neat thing is that session state no longer needs to be saved on a Session State server. A persistent hidden field (created by base page) will track a tag (either an orderid or fake ISBN). This tag along with user id will be able to create the DataObject. Normally you will just grab the DataObject from the session, but if you lose your session you can instantly reclaim it with the tag.

So all TOCs will be saved. This means the the SaveTOC page becomes meaningless. But you now have to display your unique tag so the user can use it on the history page.

The history page also becomes much more important. Here you will be able to load a toc you have created, but users might also want to manage the TOCs they have created. For example, they might want to delete or rename an order. Also when they select an order do they want to 'work' on it or generate a new copy.

So the second object will be a History Object which will also be a wrapper around a DataSet. It will handle the History interactions.

The problem is that the code is becoming more database dependent and my datalayer abstraction is dying a bit. This is ok, but moving to a new backend becomes more important and unfortunately, may generate more work. For example, I plan to power the DataObject with 9 stored procedures. Easy stuff, except that with our current backend these 9 will become about 200 stored procedures. Bleck!

Notes on a problem Vlad showed me with PageCount

Vlad showed me some database entries with bizarre page counts. I took a brief look at it. Early errors appear to be a problem in the site. It seems to have gone away. This may be very early problems with the site that have long been solved. The other discrepancy is that page count reported by the public site does not include front matter (about 6+ pages). So on confirmation this page count changes. Confused me a bit. Regardless, I don't think this is a problem.

Tuesday, March 21, 2006

All sorts of new crap and musing on it

Ok, just had weekly meeting. I mentioned the autosave problem. It turns out that Steve has been promising something along those lines. Anyway, we talked about it and the concensus was to repeatedly store the state in the SQL Database. I am a little leery of this for a couple reasons. One is performance. Consistently hitting the database isn't too bad with every page load, but if you do it on every AJAX interaction it may get taxing. The other problem is that this requires me to delve into the backend. I will need to muck with the database which is what I wanted to avoid at this stage.

So I am getting rid of my Session state entirely. There will now be no reason to keep it around since it will be loaded into and taken from SQL every time . However now I will need to define the session information in the database. Since this has to be backwards compatible it has to fit in the definition inside one of 20+ databases. I can abstract this into a datalayer, but it will still get annoying. Also I must make be careful to make big transactions since they are more efficient. Luckily .Net supplies DataTable objects for this very reason.

Another problem is how to index 'state'. There are two basic ways that I see currently. One is to index via user. This would be great but a user can have multiple sessions. So you could index by session id, but if your session id disappears you need a way to get the data.

And how many autosaves do you allow? Do you allow one per session? Do you need to differentiate them? Luckily and unluckily I think I get to determine this.

Ok, it must index by session id. So in your SQL data you will have a user and a session id. So if you lose your session id how do you get the data back? Well, by user of course. But if a user had multiple sessions ('unfinished' sessions) how do you get the right one? One option is to mess up anyone who has multiple sessions open. Another is to make some arbitrary choice. Another is to offer the user multiple autosaves. If you start collecting multiple autosaves you will need a way to manage them (i.e. erase, name).

So we will be getting rid of the SaveTOC page because it will become redundant. Each proto book can be identified with a 'draft TOC ISBN'. On the client side you can catch a session timeout and then report the ID and then redirect to the history page. You will need to be able to discard and rename.

Ok. It will be an Data Control object. This may end up being a DataSet. Changes to SessionOrder will instead change this. This will be heavily dependant on the database scheme which varies on a site to site basis. Oh well.... This is probably a better way to do it to keep with a persistent state vision.

Friday, March 17, 2006

Site Specific Subdirectories

One point I had not thought about for awhile was having site specific webpages in site specific subdirectories. The difficulty is that you want the url to be fairly seamless. I solved this in my previous design, but in that system you never saw the url. So the potential solution I am currently thinking of it conditional redirection. When you get a url request you check to see it it exists. If it does you tack on 'site=XXX' for uniformity and pass on the request. If it does not you rewrite the url to remove the extra directory and tack on 'site=XXX'. This, or course, may cause havoc with the paths. But it will probably be solvable havoc. This also reduces the generic nature of the redirection module, but since this module is a fairly small amount of code I am not too worried about it.

Thursday, March 16, 2006

Yippee!

Finally made my site live. Sent out email about it. Gavin already broke it using Safari. I don't really care too much about Safari, but I will put it on the bug list. Anyway it is nice to have it live. It does need work but it is a major step towards erasing the old crap.

Code Destroyer

So we should set up an archive. Not really to save anything, but as a place to stick stuff that is useless. There is a prevalent coding practice of never throwing stuff away. That is great if you have a source control system that handles it for you and quietly maintains history in the background. We just have random crap that no one knows anything about sitting around.

So we should set up a machine to act as a trash can - I mean 'old code repository'.

One line mistake

Okay, the engineers that came before me left a problem in the code. It is a simple problem to fix. It is a one line fix. But the problem was copied over and over again. Now I must fix 21 different sites.

This involves copying the code back and forth. Making loads of virtual directories so I can even rebuild the sites. And trying to stick by the bad versioning system set up by my predecessors. It is just keeping a copy of every version in some directory. Old versions get cleaned up when we run out of space. And the version used gets copied to a 'live' directory. And the multiple strange versions with Ds, _s, Exs, and whatever after them.

So frustrating!!! This should take 5 minutes not all day!!!

Wednesday, March 15, 2006

Custom ID problem

One design flaw in the original that I am trying to correct is the custom id. A custom id was generated in the old way based on the size of a list. It was made to be negative and within a certain range so no overlap would occur with non custom ids. This is pretty bad. Also because custom chapters are given IDs in the database it gets weird because newly loaded custom stuff is either given an overlapping id, has its ID mutated in some way (make it negative) or has a newly generated ID. I forget which. Anyway, I am hoping that in the future IDs can be made unique by making them a string. I haven't done this yet. Again because Back-End stuff may effect this. Currently IDs can overlap but not within a type. So when it is possible to get the wrong ID (for instance when removing) type is also checked.

So not a great situation, but easy to fix.

Some Order Notes

I am in the process of moving Order==>XML code from the pages to the order and I was about to erase a comment. But I wanted to preserve the gist of it. The XML format for order is designed to match the database schema. So putting it XML code in the order somewhat violates the data layer abstraction, but I can live with it. If the DataLayer is ripped out the Order object may need t obe changed as well.

Saving State and Order Serialization

Okay. One of the challenges the I feel this site needs to address is saving state. Professors spend a long time building a book and this leaves them vulnerable to session timeouts. And if they have spent a long time building a book and then lose it, they get pissed.

So the solution is to catch a session timeout and save their data. This turns out to be tricky. My first thought was to use Session_OnEnd or whatever. This turns out to not work. Session_OnEnd is managed by the process and when session state is managed by an outside server (as it must be on a farm) IT IS NOT FIRED. Toss away this idea.

The next idea is to use the ApplicationCache. It is a handy dandy application wide cache. You can toss crap into it with simple timeouts or toss in event handlers to be fired when a timeout on an object occurs. I used it to cache various configuration things and to reduce database hits for things like parameters or suggested table of contents. So you could use it for session state to. You just pop all the state you are keeping in session into the application cache every time you access your session. You set the key as the session id. You set the timeout as your session timeout. And you set up a nice little timeout handler which will save your session state if it times out in your application cache.

Ok, this gets a little tricky. I mentioned farm up above. It is possible that the thread handling this session will be run on different machines WITH DIFFERENT APPLICATION CACHES. Damn! But the timeouts will still fire and your data will be saved. In fact it will be saved at strange and unknown points. But if saves overwrite the last save should be the valid save. (Am I angering the god of strange timing events?) Of course, you will not be able to remove stuff from this multi-machine virtual application cache so it will save your session state regardless. Even if the prof saves the order normally session state will be saved and YOU CAN'T STOP IT. So even Profs who never time out will end up finding little autosave entries. This probably is ok...

Also the whole thing reeks of timing problems. I can't (subconsciosly don't want to) think of any really bad occurences since you make no real guarantees beside best effort to save state.

Ok, this leads to another design flaw I built in. This original site built an xml document in the page code that represented an order. In a moment of dumbassery I copied this structure even though the ability to translate itself into XML should be part of the Order object. So when I came to saving state I had to decide how to actually save the state. It should basically use the same system for saving a TOC, but it should use a keyword instead of an ISBN like 'SAVED' and it should always overwrite 'SAVED'. This led my to my previous design error. It also brought up another problem. In the current setup there is a delay between saving and seeing that your oject is saved. So 'SAVED' data may not be seen.

I could live with this error, since again it is a 'hopefully the back end redesign will fix it' problems. But will the BBDataManager support a saved TOC and overwriting? If not I will have to create another solution. If it does lots of useless save files will be tossed into the xml file repository, but then you could see just how often the Application cache was trying to save....

Yeah. I will have to come up with another solution.

Probably an ugly function that directly saves the orderinfo into the database. This should go into the data layer so that it can be redone later.

So what should I do? I will shift the XML saving stuff into the Order object, allow directly placing an Order object into a Message, and then build a datalayer function that can saved a TOC and basic info into the order table and can overwrite based on user identity and SAVED isbn.

How my data layer abstraction got messed up

So I made some mistakes in the design. The design is broken into 6 parts: Pages, Order Information, Message Objects, a Data Layer, and two conceptual sub layers of SQL and XML. Pages only really interact with the Message Objects which are fed to the DataLayer that spits up a responding Message.

All good so far, but how I encapsulated data in the message was bad. I originally envisioned the message as a light wrapper around an xml document. Most of the code is written this way and it often just rips the xml out of the message. But then I switch my session state mode to 'Server" instead of 'InProc'. So the difficultly was that I was saving search results in a non-serializable XmlDocument. When I looked at the code a DataTable was being transformed into an XmlDocument to make it a Message. Not horrible, but transforming it back and forth for serialization would be bad. So I basically just tacked the DataTable to the Message and one part grabs a DataTable out of the Message instead of getting an XmlDocument. This reduced data transformations and since DataTable are serializable it fixed this problem.

Course now the abstraction is fucked. So the questions are 'What exactly is inside an Mesage?' and 'How do Pages interact with Message? Do they know how the data is stored inside?' Now the best way to do this revolves around how you rebuild the back end. Of course, it doesn't need to have the back end working. And the whole Data Layer will most likely be replaced. Perhaps I should not worry about it until I rebuild the Data Layer to support a more modern back end.

I could make Message a self-contained object so that the Pages don't need to know whether a DataTable or XmlDocument is inside it, but the work to value ratio is pretty low on that. Why build some crap like that when you have DataTable and XmlDocument that do all sorts of crap for you.

Thoughts on Validators

I have two pages requiring extensive validation, Orderinfo and Deskcopy. I have used .Net validators in Orderinfo. They are very handy, but like most .Net tools are very geared towards specific uses. So you end up modifying your code to be more like those cases instead of how you want it to be. Anyway another problem I had was with Display="Dymnamic". It didn't seem to work, some of the time. The whole client side validation seemed slightly flaky. Asterisks would sometimes stay put and sometimes shift around. And for the phone and fax numbers there would be 1-3 asterisks since each number was 3 separate controls. I suppose the right thing in this case is to make each number a single control. The only reason I haven't done that is because I have been trying to 'stay true' to the design of the original site. But again this is an example of having your code forced into a pattern that Microsoft likes.

So with Deskcopy the problem is slightly different. It requires conditional validation. Conditional validation is no problem IF you have the condition change trigger a POST. The page as written has no POST here since the fields appear via Javascript and DHTML. So to use the validators I would again be forced to alter my code to a more Microsoft friendly way. Or I could use Javascript validation. This of course, is less secure since it can easily be gotten around, but in this case I don't care. But the greater sin is that it would then no longer match the Orderinfo page. *sigh*

I will put it up without validations on this page for now and either use Javascript validation for both pages or do a POST when the checkbox on Deskcopy if checked to enable Validators. The second way is probably easier, cleaner, and more secure but at the cost of a page reload.

When Ajax style technology is more heavily incorporated it will be nice. Before that you are going to have low performing pages that flash and reloads without any purpose. And .Net tries to guide you to that style of doing things. Probably because their product lives on the server and client side scripting is beyond their dominion.

Success on File Downloads

At first I thought that the file download problem I was experiencing was caused by an interaction of a download with my custom redirection module. It tried buliding the simplest error case and discovered I was completely wrong. The default settings limit the file downloads accepted by .Net to 4MB. So changing this fixed the problem. Now there are two file size checks which is a bad idea, but I am not planning on doing anything about it now.