Soft trial

I'm toying with making some articles in Cites & Insights additionally available in very simple HTML form.

I'm not sold on the idea. The reasons I give in the FAQ for using PDF continue to be valid. The trial run I'm mentioning here even validates one of them: despite using the most space-efficient (and somewhat hard to read, since the lines are so wide) HTML options, the articles combine to require more than twice as much paper as the issue does: 50 pages as compared to 24. (Yes, some of that's because of repeated headers and footers, but I'm not going to put articles out there without the surrounding material.) I also think the HTML form is a whole lot less readable and attractive, at least for print readers.

But I'm willing to give it a try, if I can do it without significant software investment or needing to take more than an extra hour per issue doing my least favorite part of C&I--that is, screwing around with HTML and postings to get the word out.

The methodology I used for this trial does appear to take about an hour to handle a typical issue's worth of articles, and used the cheapest software I could find that would handle copied Word text reasonably well. (It was a $5 CD-ROM that turned out to be a little more than just a web editor. If I turn this trial into a real feature, I'll mention that story in Bibs & Blather.) "About an hour" is without attempting to turn any URLs into live links, fix any cases where I've inserted a blank to make a URL break lines, or really do anything other than copy, paste, and mass-replace typeface indications.

Anyway: If you're interested--I'm only going to publicize this here and at the C&I Updates blog--here's what you do:

Go to the C&I Tables of Contents form, click on 2005, go down to the latest issue. You'll note that each article name is a livelink. Try a couple of them.

Let me know what you think: Is this--

  • Pointless?
  • Pointless unless I make the separate articles a whole lot nicer?
  • Worth doing without any extra tweaking?
  • Worth doing, but you'd suggest a tweak or two that won't require real work on my part?

Comments either here or to me, wcc at Comments by this Sunday, please: If I decide to do this for real, I'll try to back-convert this year's issues before 5:4 comes out (late February), then back-convert each previous volume--selectively--over the next month or four.

Modified to correct links...


I know you'll have at least one fan of that format.
Me, I'm happy with PDFs. It prints nicely and I can read it at my leisure rather than having to change screens to assist a patron and losing my place. I guess I'm an just old school hard copy loving luddite :)

just my 2¢

And PDF isn't going away: That will continue to be the primary distribution mechanism. So no worries there!

It's even possible that, if I do go to a "selective HTML copy" model, I'll delay the HTML versions...or that I won't do HTML for any story longer than X words, or... whatever.

I'm probably more likely to blog about something if I can link directly to an article in html format.

Kvetch one. Who's gonna print these? Seriously? When there's a PDF available? Right. Nobody. So why worry about it? Format for people like me who read onscreen. And do it with CSS, not font tags. If you need help, just ask; this is a relatively easy job.

(And if you must also format for print, do it the way $DEITY and standards-aware browsers intended -- with a stylesheet designed for print pages!)

Kvetch two. It's not the ugliest HTML code I've ever seen. (You wouldn't BELIEVE the ugliest HTML I've ever seen. Heck, I don't believe it, and I saw it!) But it ain't exactly pretty.

I thought about recommending a regular expression engine -- I'd write the regexes for you, even! -- but no, I think the thing to do is grab HTML Tidy and use that. Bung in a link tag or two to your stylesheets, and you should be all set.

You'll have to cruise about for Windows front-ends to Tidy, but this> looked more or less usable. (Well, less, actually -- but if you don't know what a checkbox means, leave it unchecked and you're probably okay.)

I'm glad you're considering this, though. Reading PDFs onscreen is awful.

I didn't think I was formatting for print. I do think I'm not about to cope with full-scale HTML editing or learn a new editor: I don't have the time or energy. Unless HTML Tidy retains italics and bold, and either turns smart quotes and dashes into the appropriate web operands or converts them automatically to inch signs and double hyphens, and unless it retains heading structure, it won't work for me either: The whole idea is to take the Word chunk, import it into the header/footer pair, do at most one global replace (in this case, adding ", Palatino, serif" to Book Antiqua to be nice to Macs and people with no Palatino-variant--not that I'm that hot for Palatino, but it's better than TNR), fix the title metadata, and save it.

This is purely an extra. Even that, and tagging the ToC appropriately, is more work than I want to do. The cheapo/easy "editor" I used for this lets me do the simple import. But it also enforces paragraph-by-paragraph font-and-size tagging: Change it to global level, and it's right back at paragraph level the next time you switch to HTML view. I know; I tried. I'm guessing it's not going to play nice with CSS--and unless the Word styles (which this editor converts to formatting) somehow automagically turn into CSS styles, you're talking about redoing markup, at some level, for each article. (That whole slew of table-related stuff up at the top comes directly from the editor as well, and automagically reappears if you delete it.)

Maybe I picked up the wrong tool. I know the text editor we use at work would also be the wrong tool. I've used one of the major Web editors enough to know that it would require a lot more work on an ongoing basis (and a lot more up-front expense) than I'm willing to do--and, based on the results I see, even then it would need continual tweaking to get it right.

You seem to be suggesting picking up one new tool that will handle one piece, then another new tool that might handle another piece, then learning enough about CSS to put it together (or letting you do it), then figuring out how to tag the styles without effort, then...

If this isn't workable on a KISS basis, it isn't going to happen. And if the KISS version is ugly HTML or whatever, then maybe it shouldn't happen. "Simple" in this case means minimizing the use of my at-home time and energy, time and energy that would otherwise go to writing, reading, or relaxing.

Is that link broken? Is the server down? I tried this afternoon from work and now from home and it won't come up for me.

The writer's broken. Duh. It's "citoc.htm," not "toc.htm"--and "cifaq.htm" rather than "faq.htm" earlier on. I've fixed it. Fixing my head...that may take longer.

OK, so tangognat is right, bloggers are probably more likely to blog about something if they can link directly to an article in html format. That's something to keep in mind.

cavlec might have a point about the HTML not being printed, or at least very often. It's not the ugliest HTML I've seen either, far from it. I wouldn't worry about it.

IF and only if it's not much work for you to get this done I think it's a great idea to have the alternate format. Like mdoneil said, you'll make one person happy ;-) I welcome the ability to have links to each article.

If it's a realy pain in the ass just don't worry about it.

Okay, let's take that a piece at a time.

Tidy shouldn't mess with paragraphs, header levels, or italics/bold. You are right, however, that it won't necessarily leave hooks for all the CSS you might like. So if it's too much, okay -- but it might not kill you (or me, for that matter) to try it and see what happens.

If I were doing this, I'd use the Compact HTML export from Word (silly plugin from Microsoft -- I can find it for you if you like), and then turn my homegrown regular-expression search-and-replace engine on the result. The way that works is that I come up with a canned list of search-and-replaces that (once they're correct) I just run, no muss, no fuss, NO TWEAKING. All I'd have to do then is add issue- or article-specific metadata (if any).

You wouldn't want my homegrown regex engine, but there are analogues to it (okay, okay, improvements on it) out there. Or if you're a masochist you can probably script your text editor to do the work. (Seriously do NOT recommend attempting this with UltraEdit, however. The scripting engine is half-lunatical, and the regular expressions are broken.)

CSS is also a do-once; once you're happy with it, you don't have to do *anything* on a per-article basis. You just refer to separate CSS files from each article file, and such a reference can be inserted via search-and-replace; it's as canned as everything else.

Web editors? Pfeh. If you mean D---mw----r, I can't imagine a greater waste of time.

Believe me, Walt, I'm a terribly lazy person. I hate hand-tweakage. I do this stuff the way I do because I hate hand-tweakage. I'm not kvetching out of pure markup-snobbism; I just sorta hate the idea of your HTML presence not being as polished and presentable as your PDF, when it could be at a fairly minor up-front time-cost and essentially zero ongoing time-cost.

(In Firefox, by the way, every other paragraph in the article I looked at was in a different serif font. I think it flipped between Times New Roman and something Book-Antiqua-ish, but I didn't look too closely.)

If that doesn't matter to you, then by all means ignore me. 'Cuz you will anyway, as is your perfect right. :) I tell you what, though: put a CC license on a couple of articles so that I can download, mess with, and mirror the code. Just for fun, hm?

You may be right, and maybe I'll look at that more carefully. The process you describe doesn't sound too awful. Maybe.

In the meantime, every one of those articles has the same BY-NC license as C&I as a whole, although it may not be expressed properly in the footer. (That, I can fix.)

If Firefox is flipping between the stated Palatino-equivalent and TNR, then the web editor is doing even worse than I thought. I checked a couple of articles, but not all of them. The editor seems to insist on a default of TNR. I haven't seen the situation in my copy of Firefox. But then, except in peculiar situations, I tell both my browsers to ignore fonts from sources anyway: I really and truly dislike Arial/Helvetica, and it seems like 95% of "designed" websites use it. (I turned off the override to test a couple of stories.)

I'm very much reminded of a comment Blake left on a story regarding ExLibris. I'm also in awe of people who do lots of writing, lots of speaking, *and* are always eager to learn new web and other languages and tools to do all that stuff. I've learned a few programming languages over the last too-many years (38 full time, I guess), but primarily because those languages allowed me to do my work. The nice thing about contemporary PCs is that I can mostly just use them as tools: I drive my car without knowing how to modify an internal combustion engine; I write in Word (and can establish alternate templates for the same documents--which is the one step I left out in doing the HTML version) without needing to know, for example, how it's designed internally or how to write macros. And I've been able to build a modest website--no, make that two modest websites--without needing to spend lots of time or money learning the internals of CSS or becoming an HTML hot-shot. Acrobat and Distiller mean that I can produce reasonably professional issues of a publication without learning or paying for another desktop publishing system (I used to be hot stuff with Ventura Publisher, but under Corel's care that became so unstable that I had to give it up).

So I'll say "maybe"--but I hear regex, and script editors, and the number of tools that appear to be involved in a sequence of steps...and my first reaction is to run and hide. Or sit and write. With words, I'm no guru, but I'm a proficient hack.

Thanks. If I think it's going to be a pain in the ass, it won't happen. The current "technique" isn't all that bad--but may or may not be doing what I want. I mostly, at this point, want to focus on reading, writing and relaxing, and to add the feature only if it's very low overhead.

And, frankly, I don't want to add it in a manner that encourages people to abandon the whole issues for a set of articles instead. Maybe that's pride, but there it is.

So we shall see. Interesting range of responses so far. I could broaden the range by doing a Topica post, but somehow I'm assuming (perhaps wrongly) that people who prefer email notifications to aggregator entries are probably also people who are just fine with PDF.

Sigh. A recheck (with my "regular serif" face changed to Engraver and with "use my fonts all the time" turned off) shows that you're right about the switching--although it's mostly within paragraphs, and it's always to "whatever," that is, "serif" or just the browser's default proportional typeface.

Which means that this editor, which should have the credentials for retaining Word formatting, and which certainly imposes hundreds of font tags, isn't doing the job as well as it should.

Some fun.

I'm still doing e-mail notifications for C&I, mostly on the "if it ain't broke, don't fix it" principle (why unsubscribe, then resubscribe in my aggregator, when I see notifications in e-mail just fine?). I'm also an on-screen reader for the most part, and find the kludged HTML version (even with the font shifts in Firefox) much easier to read than the (admittedly much prettier) PDF. If both versions were consistently available, I'd consistently choose HTML (unless there were an item I wanted to print). I also appreciate the ease of linking to individual HTML articles rather than an entire issue; it might be an interesting experiment to advertise this more widely and see whether it increases linking and visibility.

OK; maybe I will do a quick Topica post. How bothersome the font shifts are--they should actually be there in any browser; it's an HTML problem--depends on your default typeface.

This is turning out to be a more difficult situation than I expected...and I'm not entirely sure how I feel about being the subject of a long essay at CavLec. Part of me sort of wishes I hadn't had the thought...