Blake's blog

69 And Sunny: In BUFFALO?

Right now it's 69 degrees and sunny here in Buffalo, in the middle of January. We're warmer than Atlanta, DC, Dallas, San Francisco, Seattle, and Beverly Hills, CA. We have almost twice the degrees as Boston, so for those of you @ALA, be sure to recommend Buffalo for next year.

It should be about 50 degrees colder by tomorrow night, and pleanty of snow is on the way, but for now, we're all out at the beach working on our winter tan.

Go Danny Go!

Good News for my favorite grocer here at home. Fortune magazine named Wegmans Food Markets No. 1 on its annual list of the "100 Best Companies to Work For."

Not at all suprising to me, I've heard nothing but good things from everyone I know who works for them, and I'm always treated well by the employees.
The Wall Street Journal covered them a few years back and said something like "we suspect many people would live in a Wegmans given the chance"

One Last Look at 2004

Just one last look at the numbers from last year, these are some of the nice pretty Urchin graphs built from the Apache server logs. First, the years summary.

Three Ways To Browse Last Years Stories @LISNews

I spent some time browsing through all the stories we posted this year to see what we've been up to. Turns out we've been busy! So here's my take on some of the stories we posted, some of my favorites, and various other things I could come up with. Feel free to refresh your memory by browsing all of the LISNews from 2004:

By date, By Hits, By Comments.

What were your favorites? What would you like to see more (or less) of in the coming year?

I started last year by looking back at the numbers for something I'll do for '04 soon.

Early in the month the big news was Google was going to go
public. At this point if you were a buyer at any price you're still ahead of
the game. Care to place any wagers on the share price of Google a year from


"Gay books" started popping up early in the year,
a topic that would prove to be rather popular. href="">Exposing Kids to Gay
Themes in Library Books was the first story of the year to receive more
than 10 comments.


Also popular in January was the Almanac threads (after the
FBI warned police to watch for folks with suspicious almanacs), class=SpellE>ebooks, Cuba,
censorship, the Patriot Act, and the Library Cat saga over in CA began.


Things to do With LISNews to Kill Time on Friday
was my favorite post.



Total number of comments: 495

Total number of commentors:

Total number of stories: 265

Total number of submissions: 167


February saw a verdict in the Cat Case, href="">Dog owner loses cat
fight, some coverage for Janes' class=SpellE>google class (something I used quite heavily this fall in
my class) Google
for a grade: UW class to study popular search engine
, and a hint of things
to come later in the year with Google's first digitization project. href="">Project Ocean:
Stanford University And Google. Google stories
were popular throughout the month of February. style='mso-spacerun:yes'> One thing that really stands out for me is the
number of authors that posted this month. The dicersity
and depth of stories is great thanks to the wide range of authors, something
that has not been repeating itself.


Environmental Scan report
should have been on everyone's required reading

of 04.


of Algorithms
Board to Resign Due to High Journal Prices
was one story
of note in the Open Access movement that could be a sign of things to come as
journal prices continue to climb out of reach.


  href="">SHUSH--for the
Conservative Librarian, purple, ugly and angry, Shush ushered in the
Conservative Coven that would dominate much of the comments throughout 2004,
and prove to be quite a popular topic for discussion in the coming months. style='mso-spacerun:yes'> 


Over the "Graying" Profession Hype
was an interesting read. It
still surprises me that the ALA
is spending so much time and money trying to recruit more librarians.


Other heavily covered topics in February include blogs,
sales, WWW, and technology.


you on the RIGHT side?
Touched off a firestorm. To
this day I'm still amazed how this post was interpreted by some folks. Publicly
and privately I was taken to task for even considering conservatives deserved a
place to be heard.




Total number of stories: 443

Total number of submissions: 276

Total number of comments: 739

Total number of commentors:


March started, and ended, on political notes. SCO, filtering
laws, the looming election, privacy and censorship all started to become hot
topics. Study Finds
a Nation of Polarized Readers
was an article that could be applied to
LISNews as well.


One of the more bizarre stories of the year showed up in
March, School
librarian suspended for taping students' mouths



Total number of stories: 348

Total number of submissions: 259

Total number of comments: 1015

Total number of commentors:



On to April, the moth O' Koha, when href="">Koha 2.0 Released

Cheney's Lesbian Novel Reissued
was just one of the politically charge
stories that made the hits list. The Cleveland
library strike was also in the headlines for much of the month. href="">Police arrest
sleeping library thief was one of the dumb librarian criminal stories that
would surface from time to time through out the year. style='mso-spacerun:yes'>  Google, the PATRIOT act and filtering were
all popular throughout the month.



Total number of comments: 1117

Total number of commentors:

Total number of stories: 364

Total number of submissions: 262


April showers bring May flowers… href="">Drunken Birds Crash class=GramE>Into Library Windows

Those showers also brought the attack of the birds.


fight library's gay-themed books
turns up on the most popular stories ever
list, thanks to almost 100 comments.


May was also the month I went begging for bucks. I had just
moved all the other LISHost clients to a new server and couldn't (and still
can't) afford to pay the $180 a month to host LISNews.


Shortly after that we learned Rory was taking me to task for
my open door policy, href="">Rory Litwin critical
of LISNews received a record ~11000 hits and ~150 comments.

Litwin responds to LISNews criticism
was his follow up. He's since cut all
ties to our site and hasn't been heard from again. A truly
awful ending to this one.


Other stories of note include the firing of the

directory, the final grunt from Ref Grunt, and Seattle's
sparkly new library.



Total number of comments: 1293

Total number of commentors:

Total number of stories: 287

Total number of submissions: 192


And on to June. We crept into
summer here in the Northern Hemisphere, but that didn't slow things down. Book
thefts, the google spam context, and blogs all made
headlines. Books were voted on and off library shelves, the erate
abuse problems started showing up, and the search engine wars 2.0 took off.


A stand out for stupidity, href="">This University
President Tells Faculty Senate Going To Library Is A Waste Of Time, and one
for smarts,   href="">lied to him
in library school.


A couple of standouts, for me, where href="">LII: Be Sure You're
Getting the Real McCoy and href="">PUBLIB list of things
making libraries look stupid



Total number of comments: 1327

Total number of commentors:

Total number of stories: 342

Total number of submissions: 285


July saw the end of " Anonymous
Patron" @LISNews. The archives started to be filled at the new Clinton
library, Fahrenheit 9/11 started to make headlines, and "The
Librarian" was announced. Google was busy doing goggle things that were of
note, and dumb governor tricks, when the href="">SD governor pulls plug
on part of library Web site. Someone dared question The Cat in the Hat with class=GramE>,  href="">Do Dr. Seuss books
really promote reading? And href="">Library toe-sucker
arrested. It was a strange month.



Total number of comments: 1281

Total number of commentors:

Total number of stories: 393

Total number of submissions: 313


For Bush/Cheney '04
took to the web, in response
to Librarians
Against Bush

I'm not really sure which side won at this point. style='mso-spacerun:yes'>  Politics was a popular topic through out the
month, Sandy Berger, Michael Moore and the other usual suspects all made the
headlines often. Other names include Charley and the gang of three hurricanes
that beat Florida libraries into
submission. Filtering, and Laura Bush also were seen
on our pages often.


I tried again with the style='mso-spacerun:yes'>  href="">The
LISNews QuickSubmit: The Best Thing Since Sliced

It's still not being used, but at least we're getting a good
number of submissions.



Total number of comments: 972

Total number of commentors:

Total number of stories: 321

Total number of submissions: 257


Back to school time, September. href="">Library Jokes from
Kids, Googles birthday, LII, researchbuzz,
OCLC, Laura Bush, books, censorship and filters all made up a rather quiet

Some of the more interesting stories included, href="">Librarian class=SpellE>Jessamyn West Profiled by Wired, href="">Google Building class=GramE>Browser ? and href="">"Burly"
Bloggers "Draw Blood" in CBS Bush Bash



Total number of comments: 721

Total number of commentors:

Total number of stories: 228

Total number of submissions: 253



On to October, phew, this is getting tiring. The news
started out on a positive note with href="">It Skills Pushing class=GramE>Up Law Librarians' Pay, and a funny note with href="">Nudist Library wins class=SpellE>Ig Nobel Prize. Café's, Nancy Perl, RFID, Cell Phones,
car crashes and lottery winning librarians were all part of what was a rather
interesting and kooky month for library news. href="">Shredding books with
Lynne Cheney was the most popular political story in what was a rather
political month. It was a neat and weird month. href="">Wal-Mart vs. George
Carlin, Memories
of Bill Katz
and href="">Gone class=GramE>With the Wind heirs threaten Project Gutenberg were a
few of the more interesting stories that caught my eye.



Total number of comments: 1049

Total number of commentors:

Total number of stories: 303

Total number of submissions: 259



November. Ah, finally, November,
I'm tempted to just skip it and say nothing happened. href="">LIS Programs:
Reclaiming the education of academic librarians started off a more serious
month. By far the most important story for the country, no, the entire world,
was, of course, LISNews:
5 Years and Going Strong
. Apparently there was something going on with some
election somewhere, so political stories also made up a big part of the news. href="">FoxNews:
Aschroft to resign was the one that probably
caught the most eyes. Some other stories of note, href="">Thanks for Oprah: The
Oprah Meme, Libraries
in a bind...Libraries are finding it difficult to replace aging staff
, href="">A librarian on The
Librarian, and Come
the Revolution, the libraries would be first target



Total number of comments: 1012

Total number of commentors:

Total number of stories: 265

Total number of submissions: 263


And last, but not least, December! December started off
right with, 'Blog'
#1 word of the year
and then probably one of my favorites of all year, href="">A Billion-Dollar IPO
for Johns Hopkins Libraries.

LISNews: a surprisingly interesting site

A rather nice compliment paid to us by
" For library news, go to LISNews, a surprisingly interesting site, which may be interesting because those of us over 30 depended so much on libraries for the functions we now routinely accomplish online."
Good stuff! A double compliment, one for us, and one for libraries.

According To Blake: Stories That Shaped 2004

Here's my list of stories that stood out last year. I struggled with the order, deciding to leave the list number free. Many of the stories felt like they were just as important as many others.

Google and Their Army of Scanners: Without a doubt the single biggest story of the year, and the one with the longest lasting impact, and also the one that has already been discussed to death. Love it or hate it, Google is going to have a huge effect on libraries in the coming years. This will prove to be the story that shapes 2005 or 2007, but it was the biggest story of 2004.

Politics: It was one of the biggest election years ever, and the library world was not immune to showing off some political heat. Conservatives, convinced the Bush administration is doing gods work butted heads with liberals convinced Bush is the devil incarnate. The end of the free world is coming soon, that much everyone can agree on, we just can't agree on who's going to bring it. Librarians for Bush, Librarians against Bush, Radical Reference, and Laura Bush all helped make 2004 the most politically charged since, well, since the last election at least.

Library Budgets: More important than Google, but not as exciting as politics, budgets were big (bad) news around the world. Buffalo almost bit the dust, and Salinas California did, things continue to look bleak for the public library budget. Public libraries must ramp up their marketing in 2005 or face increasingly lean budgets that will only drive more people towards Google.

Technology: RFID, search engines, ebooks, wireless and some other cutting edge technologies will continue shape libraries in the years to come. While the printed word is sure to be with us at the end of 2005, we now face real competition from privately run corporations in an area where we once held a near monopoly, access to information. Librarians are working to move into the future as quickly as possible, all the while trying to balance issues such as copyright, privacy, long term viability of storage mediums and budgets.

Copyright & Legal Issues, past present and future: From the Patriot Act to Fair Use and copyright. Most of the legal stories from this year were negative. CIPA, the ERate problem and various other legal stories contributed to a glum year for libraries in the courts and on Capitol Hill. It remains to be seen what the conservative grip on everything political in the US will mean for libraries before the next election.

Censors: Conservatives were increasingly unhappy with what they found on the web and in their local libraries, and they weren't about to let something like the 1st amendment stand in their way. If they don't like it, it's no good for anybody, especially if it's part of the dreaded homosexual agenda. "Compassionate conservatives" unleashed their compassion by banning gay marriage, and banning (or attempting to ban) books with any hint of the gay agenda. Of course conservatives don't have a monopoly on censorship like they do on power. Conservatives happily point out the liberal media elite along with liberal campus elitists continue to censor stories on religion and conservative values. Let the "I know you are but what am I" debates begin.

Filtering: The issue that never dies, filtering proves it will be a hot button issue so long as there are children and an Internet in libraries. I'll probably be taken to task for even separating filters from censors by some, but filters continue to take center stage in the battle against evil. Balancing the desire to protect the kids from the internet, the need to ensure everyone has access to information, and the need to set some kind of standards is a tricky proposition that will most likely be a big part of our news for years to come.

Blogs: The number one word of the year, 2004 saw us bloggers gain at least a toe hold in mainstream society. Pundits argue, writers write, bloggers blog, everyone it seems has an opinion on the importance of blogs today, and in the future. For an increasing number of people blogs are a primary source for much of what we learn about the world around us. It remains to be seen if our numbers can grow, or if what we are learning (and sharing) is worthwhile, or even accurate. Bloggers even got a spot at the big party conventions, and we had one librarian at the Democrats table.

Ashcroft: Love him or hate him, he made headlines all year in the library world. His "hysterical librarians" comment will continue to echo through our halls long after he's been replaced. We can only hope his replacement proves to be half as entertaining and confrontational to librarians. We can also hope he's just as effective at keeping our country attack-free in 2005.

Open Access: Was 2004 the year open access finally got some traction? Maybe. Projects like DSPace, and with a push from the NIH a flood of open access journals, and prominent defections from the old world journals, 2004 might have been the tipping point for the Open Access folks.

Open Source In Libraries: Was 2004 the year open source finally got some traction? Probably not, but we're getting there (The Linux librarian might say yes). We now just need to convince some of the bigger libraries (and library organizations) these are projects worthy of their support and further funding. Open source is still not accepted in mainstream library decision making, but with high profile projects like Firefox and Thunderbird finally hitting 1.0, we may see open source taking a bigger role in 2005.

Graphic Novels: Librarians, libraries and the American public finally caught on to the wave that swept through Japan years ago, Manga. Incredible hand eye coordination and some active imaginations are moving comics from the back shelf to required reading for many American teenagers.

The Librarian Movie: You probably hated it, Rochelle sure did, but if watching a beautiful woman in a tank top run around in the jungle for 2 hours isn’t entertainment, I don't know what is! Noah Wyle is sure to sign up for a sequel, and if we're lucky a mini-series. If nothing else it showed just how insecure many librarians are with our "professional image."

Harry Potter: The boy wonder continues to make headlines for doing what he does so well. With book 6 not due for another 6 months Hogwarts fans were clamoring all year for any new tidbit of news on Harry and the gang. In 2004 you'll have to be satisfied with a movie, and a few DVDs, more than most books can even dream about.

Library Crimes: I don't know if we just reported more stories than previous years, or the number of library related crimes really did jump this year, but it's been a banner year for library who dunnits. From the deputy librarian to sleeping burglars, pepper spray and map thieves, libraries worldwide have become a target of increasingly expensive crimes.

Coffee & Cell Phones: A popular topic for discussion, cell phone blockers and in-library coffee shops made our lives a little quieter and a maybe little jumpier. Rochelle called this the "Continuing book-storization of Libraries," which also included the confusion by patrons about thinking it's perfectly okay to let your cell phone play "What do you do with a drunken sailor," and cut off a librarian mid-reference interaction.

Nancy Pearl: The only librarian to be immortalized in doll form, Nancy made the rounds on NPR and made headlines from Seattle to Washington. Our obsession with the "librarian image" will never die.

Rory Litwin Vs. LISNews: Only a big story to the LISNews crowd, The Juice Vs. The LISNewsterz. Rory took me to task, the LISNews community took him to task, and our relationship ended badly.

Honorable Mentions: The Clinton library, Marion County (FL), The Misspelled Mural, The mis-numbered Clock and the odd Michigan Patron Information story.

Some LISNews Numbers for Year 2004

Pulled from the Slashcode database, here are some of the more interesting numbers that cover all of 2004. I'll have the Apache log stats via Urchin posted sometime later this week. There are probably some interesting things I didn't think about when building this list, so do let me know if you have any other ideas.

Busiest authors:
Blake : 1229
Rochelle : 928
birdie : 304
Ieleen : 211
Amke : 176
Anna : 135
Samantha : 118
Dan G. : 104
Tania : 72
Bill Drew : 61
John : 51
Bibliofuture : 50
bentley : 48
Daniel : 46
Mock Turtle : 36
rudimyers : 34
Karen K : 28
Jonathan : 23
thesaint : 21
Ryan : 19
Kate : 16
jen : 15
Louise : 14
Karl : 13
Aaron : 13
Richtea : 11
Brian : 10
mcbride : 10
ADForte : 9
Dennie : 8
rramsey : 7
Marlene : 7
bibphile : 5
Senny : 5
Ami : 4
jessb : 3
SaRoo : 3
misseli : 3
Sansanee : 3
InfoWhale : 3
eho : 2
amanda : 2
jeff : 2
tonyb : 1
Arnie : 1
Celine : 1
Marcy : 1
BrianS : 1
Ben : 1

Authors hits per story:
Daniel : 671.35
eho : 620.00
Aaron : 595.31
Brian : 531.20
Jonathan : 519.74
Sansanee : 514.00
rramsey : 513.86
BrianS : 513.00
Ryan : 507.05
John : 487.80
Karl : 472.23
Anna : 466.85
Ben : 460.00
Bill Drew : 459.11
Bibliofuture : 455.98
Rochelle : 429.50
Celine : 429.00
Louise : 415.57
bentley : 390.08
misseli : 390.00
amanda : 381.00
thesaint : 367.76
Samantha : 356.91
birdie : 348.02
jen : 334.60
Senny : 332.60
Arnie : 328.00
Blake : 323.59
Dan G. : 311.33
Marlene : 304.71
Mock Turtle : 284.53
Amke : 279.92
jeff : 275.50
rudimyers : 259.50
ADForte : 254.89
Karen K : 241.50
Tania : 232.88
mcbride : 224.90
Kate : 220.75
Dennie : 216.88
Ami : 216.25
jessb : 206.33
Ieleen : 203.69
bibphile : 203.60
Richtea : 193.18
Marcy : 169.00
SaRoo : 108.33
tonyb : 87.00
InfoWhale : 55.00

Total number of comments: 11627
Total number of commentors: 362
Busiest commentors:
Anonymous Patron : 940
GregS* : 742
nbruce : 708
mdoneil : 660
Rochelle : 520
Fang-Face : 503
Daniel : 460
tomeboy : 457
ChuckB : 423
birdie : 398

Total number of stories: 3868

Total number of submissions: 3109

Total number of metamoderations: 2011
Total Fair: 1916
Total Unfair: 95

Total number of moderations: 1004
--Total up: 888
--Total down: 40
--Funny: 42

Average score of moderated comments: 1.7481
Comments with a score of 5: 287
Comments with a score of 4: 613
Comments with a score of 3: 1120
Comments with a score of 2: 2246
Comments with a score of 1: 6476
Comments with a score of less than 0: 142

Number of journal entries: 2032
Number of journal comments: 1924
Most prolific journalors:
313 : nbruce
222 : shoe
147 : slashgirl
143 : ChuckB
139 : Daniel
87 : Blake
84 : AshtabulaGuy
82 : Rochelle
69 : mdoneil
68 : Fang-Face

Number of people who logged in: 1725
Total number of user accounts: 3700

LISNews Statistics For December 2004

Happy New Year Everyone!

I'm working on the numbers for all of 2004, I hope to have those posted this weekend. For some odd reason the Apache didn't write any log files for about 30 hours this month. So these numbers are a bit of a guess, I added in the average for the day we missed. That means this months numbers are still a bit lower than reality.

Total Sessions 239,076.00
Total Pageviews 924,284.00
Total Hits 2,253,484.00
Total Bytes Transferred 23.83 GB

Average Sessions Per Day 7,486.32
Average Pageviews Per Day 28,847.87
Average Hits Per Day 70,434.97
Average Bytes Transferred Per Day 754.06 MB

Average Pageviews Per Session 3.85
Average Hits Per Session 9.41
Average Bytes Per Session 103.14 KB
Average Length of Session 00:10:28

The rest of these numbers (assuming I've coded this page correctly) are accurate.

Busiest authors:
Blake : 153
Rochelle : 71
Anna : 21
birdie : 18
Daniel : 15
Samantha : 9
bentley : 6
Bibliofuture : 5
John : 5
Bill Drew : 2
Arnie : 1
rramsey : 1
Ryan : 1
Ben : 1

Authors hits per story:
rramsey : 731.00
Bibliofuture : 614.00
Bill Drew : 591.50
Daniel : 464.13
Ben : 460.00
Rochelle : 433.23
Anna : 371.95
John : 352.40
Blake : 332.56
Arnie : 319.00
birdie : 303.94
Samantha : 282.67
Ryan : 266.00
bentley : 254.67

Total number of comments: 606
Total number of commentors: 90
Busiest commentors:
Rochelle : 52
GregS* : 52
nbruce : 42
ChuckB : 37
mdoneil : 32
Daniel : 29
twistedlibrarian : 23
Great Western Dragon : 21
Blake : 19
AshtabulaGuy : 18

Total number of stories: 309

Total number of submissions: 323

Total number of metamoderations: 1052
Total Fair: 989
Total Unfair: 63

Total number of moderations: 516
--Total up: 468
--Total down: 17
--Funny: 18

Average score of moderated comments: 1.7466
Comments with a score of 5: 16
Comments with a score of 4: 33
Comments with a score of 3: 68
Comments with a score of 2: 126
Comments with a score of 1: 351
Comments with a score of less than 0: 1

Number of journal entries: 149
Number of journal comments: 143
Most prolific journalors:
24 : nbruce
15 : Daniel
12 : slashgirl
10 : Blake
10 : Durst
9 : ChuckB
9 : twistedlibrarian
8 : AshtabulaGuy
7 : Bibliofuture
6 : Walt

Number of people who logged in: 379
Total number of user accounts: 3697

Bringing You The LISNews

I get 90% of the stories I post to LISNews through Google News. A few times a day I run searches for librarian, libraries, library, books, and a few other words and phrases and that will usually dig up enough stories to fill up a days worth of LISNews. If I'm lucky there's enough submissions & other posts from LISNews authors I won't even have to go searching for any more. Most of what I end up posting is news from wire services and the popular press. I post things that seem interesting to me, or things that I think will be of interest to others. It's a formula that seems to be working, though I'm not sure it's one that'll keep working forever. I've always thought of LISNews as a newspaper for librarians, without the original reporting.

I never, well almost never, point to articles in the library press. I do this for a couple of reasons. The first is I almost never read the library press. I don't really have any reason for avoiding librarian oriented publications, I just don't find myself drawn to news aimed at librarians. My current job doesn't require much "librarian" reading, and when I have time to go searching for news, I don't look to other sources that are most likely already reporting the same things we are covering (or already covered) here. The second reason I avoid other "library" sources is I think you should be reading them already. Obviously this is a "do as I say not as I do" thing, but I think librarians should be reading LJ, AL, and a few other general library oriented magazines on a regular basis (along with LISNews). More importantly, I think most librarians DO read the sites I'm avoiding on a regular basis, therefore there's no need for me to keep my eyes peeled for things I find interesting to post to LISNews. I feel like I'd just be repeating what you've already read, and I'm always hoping to find that one good story that you'd never see anywhere else. A quick peak at AL reinforces my belief that I'm not missing much, at least there. The vast majority of the "Top Stories" were covered here. I guess that means we're doing a pretty good job. I do love Joseph Janes and I'll miss the Crawford files, but often the cover story at AL doesn't get me excited, it's the columns that interest me.

I think I've already written my thoughts on other blogs. I also avoid most of them, though now that LISFeeds is mo' better, I find my self reading the lisblogs far more often than I used to. I avoid them for largely the same reasons. Because I don't want to feel like I'm just doing the same ol' thing that everyone else is doing. I'm afraid that if I do keep up on all the trends, stories and bickering back and forth on the librarian blogs I'll become too involved with that, and lose my focus on doing whatever it is that I'm already doing at LISNews. Maybe I am doing the same ol' thing everyone else is doing, but at least I'm ignorant and blissful.

Thoughts? Should I be watching the library press for tidbits? Am I a rotten arrogant bastard for avoiding the other blogs? What changes should I make for the new year?

Speaking of next year, I'm about half way through a look back at the stories of 2004.

End of the Year Random Notes

LISFeeds has come along nicely. Last week I added a little Feeds Search page which is quite handy for checking to see who's talking about you, or watching for trends and such. Some day I'll add a "most popular links" page. For now the site is quite nice and actually has some usable features, till now it's been the ugly ignored step child of the LIS family.

Popularity != Authority

I was reading something Seth Wrote via LISfeeds this morning and noticed his link to the Top 100 Technorati. 'Twas there I noticed something I hadn't earlier when I wrote my thoughts on popularity. Technorati calls their Top 100 list "The most authoritative blogs" ranked by the number of sources that link to each blog. My question is, can we say a site is authoritative because it is popular (i.e. linked to heavily)?

This reminds me of journal citations, power laws and impact factors. Things that are cited most often are considered the most useful. But is that the case here? I was looking down the list and noticed a few things. I regularly read, but never link to, 2 of the top 20 blogs, I've never heard of, nor read, 10 of the top 20, and the vast majority of the top 100. No, I should not be the guy in charge of ranking the most important blogs, but I should have, at the very least, seen the site at some point.

There is an implicit assumption made here that a link means something, that it's a vote of confidence, maybe a link of agreement, or admiration or even acknowledgement of knowledge. They add all that up, and call those sites that are linked to most often authoritative. It follows the same theory as Impact Factors, but follows none of the careful, deliberate, well thought out methods. A link is not the same thing as a citation. While a link can help us to determine the popularity of a page, or site, a link should mean little when determining the authoritative powers of any site, it should be simply one piece of the puzzle. Something tells me ranking a SuicideGirl, the NBA, and Evanescence in the top 100 most authoritative blogs in the world means there is a problem with these rankings, or at the very least, we can't call all these sites authoritative. If anything, many of these are the sites that are the most biased and therefore least authoritative sources of information available on the web.

In some ways this list just follows the ratings we see on TV. People are drawn to simple, flashy and shallow, and this list reflects that very well. A list of truly authoritative blogs would first start with some sort of categorization. Political blogs on one list, computer blogs on another, search engine blogs on a third. The authority of each blogs could be based not just on the raw number of links, but rather on a scale more closely resembling PageRank. Many factors need to be considered outside of the idea that link==authority.

Popularity and authority are not the same thing.

20th Annual Computers in Libraries

The 20th Annual Computers in Libraries program is out, and LISNews will be representin' yo:
LISNews — Collaborative Blogging
Thursday, 4:15 p.m. – 5:00 p.m.


By Timothy Noah over at Slate, "If, after my journey is ended through this vale of tears, I should be favored with remembrance, it will likely be for the succor I provided holiday shoppers. It was I who discovered the customer service number for"
Amazon's Customer Service Number

Whither My Bookmarks

Seeing Walt mention he'd removed Orkut (or is it Okrut? I can never remember) made me think maybe I need some new tools. Not a day goes by when I don't think to myself, where the hell did I read that? Where's that link? It seems like every day now I wish I had bookmarked something I had read a few months back. When I need to find a story, or piece of supporting information and I can't get it back I realize that even in the age of Google some things are still hard to find. A quote I used in class said "We are looking for concepts but forced to search for words." I think of that every time I can't find something. I can usually remember bits and pieces of what I read, it's usually enough to find it again thanks to "everything being findable," but it's really frustrating when I can't find something I just know I've read and it would be a perfect link for a story I'm posting. I'm a librarian damnit, I should be able to find anything (or at least that's what my wife tells me) The trouble is I can only remember a few bits and pieces, just a part of the concept, and when I search I am limited to searching for keywords.

My bookmarks haven't been updated in months, I haven't touched my furl account in weeks, and though I us it, my links page needs a rewrite. The web contains at least one copy of my "original" bookmark file (extra credit to anyone who can find it), that is the file I started to assemble when I first started surfing the web in earnest back around 1995 or 6, via AOL & at school. I remember that file growing like weed for years. It was a collection, I carefully organized it, maintained it and used it. It wasn't a collection like a curio cabinet full of those stupid weepy eyed porcelain dolls, it was a collection I used every day, I was proud of it. It was more like a library collection than a collection of dolls, but I treated it about the same. It was a source of pride to be shared with all that happened to wander into my home. At some point, maybe in mid 2001 or 2002 I stopped collecting links. Up to that point I felt there was a shortage of links, like I needed to get all I could, and keep them safe. I felt like I'd never find my way back, like the web was a completely unavigable back water and without my trusty bookmarks I'd be lost. It was my feeble attempt at bring order to the chaos that is the web. This seems to be a common trait in the librarian profession. But then, at some point I started to think, it seems, there was no more shortage, I felt like I could find everything again. Maybe it was blogs, or Google, but the web got much smaller for me. The web was less a frontier to explored and more of an old friend I knew too well. What I knew was plenty, what I've collected was good enough. But like any good librarian I don't really believe that's good enough, I just don't seem to be able to do anything about it. I guess that's what blogs are all about, saving links and bringing some kind of order to the web, but they're just not the same as a good, updated, bookmarks file. What about using those bots and agents to go out and find me things?

I'd probably never use anything that promises to bring me more to read. How much more do I need? How many more links, how many more stories, how many more ideas can I handle? I think the answer to that is almost none. When I win the lottery and I'm able to work on LISNews/Host full time I'll need a few agents, a couple bots, and several dozen email alerts, but right now, I've got all I can handle. I can't even bare to add another link to my bookmarks file, it just seems like too much. To make matters worse, I have one computer I use all day at work, and one I use all night at home (yes, I spend every waking hour in front of a computer).

What I do like is the ability to very selectively monitor certain ideas in certain areas for changes. Take, for example, a news search for LISNews. I like to know when, if ever, LISNews comes up in newspapers, TV, magazines, and other major media outlets. What I wouldn't want to monitor would be how often the word library turns up. RSS seems like a good option here, but I just don't care about that many sites for it to be really useful.

So what would happen if I did allow an agent to go out and intelligently gather more for me. More sites, more news, more of everything. In theory, I'd allow that agent to learn, and grow, and collect more than I ever knew I could be interested in. Something like opencola, where it tells me about what I don't know, and didn't even know I would be interested in. So now what? In theory I've got the ability to learn about everything that might interest me. Does anyone with a full time job have time for such things? I can't digest the information I find now, I can't even imagine adding to the current amount, even if it's perfectly suited to my interests.

The goal of an intelligent agent should not be more, but rather less. I want fewer stories to read, fewer sites to visit. From less I want more.

Email is broken

Last night, for the umpteenth time LISHost was brought down by spammers. Several thousand messages to one address and POW, over 100 domains went dark (and this is not a small weak server). Sometimes I'll just sit and watch the UCE (aka, spam) get absorbed LISHost, and the volume is simply amazing. I can only run sa-learn so many times before I just want to give up in disgust. I've got every trick in the book running, RBLs, spassassin, and so on, there's only so much I have the time and money to do.

So when I say "email is broken," I don't mean my email, I mean email. It's broken, and UCE broke it. A few years ago I kept hearing people say things like "oh, I don't get much, I just delete it" or "I don't mind, it's not so bad," well those days are long gone, and I don't know what can be done to fix it. If my server can be brought down because of the amount of spam going to one email address, and there's nothing I can do to stop it, we have a problem.

Most of what I am talking about is illegal unsolicited advertisements from scam artists and assorted other criminals. So I'm equating SPAM with any other type of scam and also with robbery, assault, and everything else evil and illegal. It's done illegally by stealing resources from others with the intent of stealing money from yet other people. It's done by criminals who hide behind the anonymity provided so well be the internet.

How much spam is out there? I don't really know how anyone can even estimate such things. AOL says it blocks a billion messages a day, judging from my inbox @my AOL accounts, I believe them, actually I think they're underestimating. Well over 90% of the total email in my other email accounts I get is SPAM. Other estimates range from 40 to 60% of all email floating through the ether is now spam. I'd say those are conservative numbers. In any case, the majority of email is now trash. My guess is 90% of all the email sent to LISHost is UCE as well.

How are they doing it? From what I've seen there's a large number of dictionary attacks that focus on a few domains on LISHost. They have a dictionary of names (carver, billington, smith, Hartman, crawford) and simply send an email to every name in their list hoping that some will make it through. They do it by using what appear to be computers that have been cracked. They rarely use the same computer to send more than a couple messages to each server. Some of the worst, and impossible to block are actually sent from me, that is, the from: and to: lines are the same in the mail. Some of the other bad ones, though these are most likely the result of viruses, comes from the "lisnews user team" and have attachments that probably open the infected machine to being used as a spam zombie.

So what can we do? As users, as librarians, as educators we are in a unique position to use a wide range of resources to help stem the tide. Through user education we can alert people to what's going on, and what they can do to help stop it. It looks like most of the spam coming into LISHost is from compromised home computers on high speed connections.

I am afraid this, coupled with various other nefarious doings on the internet, will be it's down fall. As things get worse governments, or corporations, will promise to make us safer by slowly tightening down on the openness that has been part of the internet for years. Trading safety for security is an easy sell now, and I'm afraid once we've traded one for the other, we'll lose all the freedoms that made the internet what it is today.

So what are my options? I still have a few, the most drastic being to just give up. Slightly less drastic is to make a separate server. Something is going to need to change soon.

But let me finish on a positive note, I have a confession to make. I love AOL.
I just logged into the email account I use as the primary contact for all the domains I own, it's an AOL account I've had for 11 or 12 years now, and I was shocked at the amount of spam I had received. I hadn't checked that account in weeks, and I had 12 new messages, and only 5 of them were SPAM. This was, for me, truly shocking to say the least. I was shocked that I'd only received 5 in several weeks, it doesn't even seem possible. This gives me hope, it leads me to believe there are technologies that are winning in the war against spam. AOL deserves some praise for doing something very well. I have no idea how they are doing it, but they really seem to be winning the war against spam.

Some Thoughts On Popularity

So Arnie asked, "am I being too easily impressed?" [with the LISNews stats]... My first reaction was "yes, you are, we ain't nutin" we are but a little blip on the radar of the internet. But, then I got to thinking. I have no idea just how impressive those numbers really are, maybe they really are impressive? Do I compare us to Slashdot, Metafilter, Fark and BoingBoing? Or do I compare us to The Resource Shelf, Library Stuff, The Shifted Librarian and Probably more the latter than the former, but I'm going to set out to find where LISNews ranks when compared to different web sites.

First, we already know web stats are notoriously uneven, and inconsistent, but they're all we have. I'll do my best to compare apples to apples here.

So where does LISNews fall in the great blogosphere popularity bell curve? Somewhere in the middle or so…

When we can, let's just consider hits, pageviews and sessions when we talk about popularity from the server side. I'll also delve a bit into RSS feeds as well, but that's a big can o' worms subject to a entire new set of biases and reporting errors. Unfortunately for this piece, there is no single Blogostats page where one can browse stats from any blog of interest. But, luckily, there are some ways we can get some educated guesses at who's doing what in terms of numbers. From what I could find, The Truth Laid Bear: Traffic Ranking Page is about the best we can do. It's an impressive collection of numbers from 250 blogs covering a wide range of topics. [See also a good Look At Those Numbers] Another good piece on this topic is Shirky's Power Laws, Weblogs, and Inequality that goes into some issues with ranking blogs against each other.

So if LISNews was on The Truth Laid Bear: Traffic Ranking Page we'd come in around #40 if we use this months numbers (so far). My only quarrel with those numbers is he's using "visits" a notoriously hard to measure number. What I'd like to do is figure out a nice algorithm that would take into account visits, pageviews and hits, and rank those sites on that magical number. We don't really fit their "ecosystem" numbers, so not much point in trying to put us on that list. So, 40 out of 250 ain't bad! So what other sites play the numbers game that we can try to fit our self into?

Top 100 Technorati has a neat way of doing things, they meausre rank blogs by the number of blogs liking in, kind of a google pagerank for blogs. Not suprisingly LISNews doesn't show up on that list. It's probably no surprise that librarian stories aren't of burning interst to the world at large.

There's a few other sites out there that pretty much replicate this model, or measure something else close to that. Truth be told, it's a hard thing to measure, and they all do it a bit different, and they all do a pretty good job. Jenny is the only one that I've seen on any of those lists. What else can be measured? RSS Feeds!

Bloglines Most Popular Feeds is a good place to start, and it's also a place where more than a few librarian sites show up. Jenny shows up as the #1 librarian feed on the list, with the raltively new LII feed a close second, followed by Research Buzz, and Steven M. Cohen. They all fall in the top half of the list, I think, why would someone have a top 100 list with no numbers?? In any case, this is a librarian friendly list LISNews doesn't appear on. Again, there are several othe sires with similar lists, bloglines seems to have the most librarians on their top xxx list for some reason.

So, to summarize what I've learned so far, The Shifted Librarian is the most poular librarian blog, according to my interpreation of browsing a bunch of sites that attempt to measure popularity. (Since I host The Shifted Librarian I can also say she is the most popular target for spammers as well, probably not the kind of award anyone wants to get!) I think saying that Jenny is most popular is a) an honor and b)correct . It's an honor because she's popular in an ubelievably huge and crowded field full of talented people. It's correct because, well, the lists that rank these kinds of thigs say it's so. But, c) it's not entirely accurate, because these lists are rather insular and limited in scope, and they probably miss many good sites. Special niche sites like, and LISNews don't show up here because we are popular in a different way. We have limited appeal to a limited audience and aren't linked to by the blogosphere because what we write about isn't very interesting to most people. Even though these numbers are far from perfect, they're all we have, and they all pretty much agree "Shifty" takes lithe LISBlogoprize for most popular librarian blog. She's got a great domain name, great posts that appeal to a wide variety of folks, and a great hook.

So what ways can we measure popularity? By my way of thinking the broadest measures are the best. That's why I say Jenny is the most popular. She shows up on the most lists, and since I can cheat and look at her site stats on LISHost, I know her site is quite popular. But because of the nature of the different ways we measure such a diverse number of sites and places there are many different factors to keep in mind. Other things to think about [that we can't measure very well]:
Number of folks on the mailing list (from what I hear resourceshelf would win hands down).
The afformentioned Apache logs, as best I can tell LISNews comes out on top there on some measures, but is beat by Shifted in others.
Number of subcribers to RSS Feeds, that's a Jenny victory again.
What about number of participants and authors? Not a fair comparision, but nonetheless one that interests me, since that's what LISNews is all about.
Number of links between the library blogs. If I had the time I could pull those numbers out of LISFeeds.
The diversity of pages, that is, how many different pages are being viewed and hit.
How frequently the site is updated, how long it's been around, and so on.
All interesting things that would be fun to study.

What's important to me about the LISNews numbers is not necessiarily the raw numbers, but that we have a wide range of people participating, reading, and sharing stories.

So to answer you Arnie, no, you're not easily impressed, I'd say our numbers are impressive! Having watched them grow from 0 to what they are today I'm mightly impressed that so many people see some value in what we're doing. There's probably only one other thing I've put more time into than LISNews, and that's the deck I built this summer on my house, not something easily shared with people around the world like LISNews is!

It's impressive that there is a wide range of people reading a wide range of stories and participating in someway on the site, and yes, the numbers themselves are impressive, even when compared with most other sites. And you should also be impressed with The Shifted Librarian's popularity!

Several Random Tech Notes

Last class was yesterday, I'm sad it's over already. I think we had a good semester, the applause at the end of the last class leads me to believe the class did as well. It was a great bunch O' students, hopefully they learned as much, or better yet, MORE, than I did. Now I just keep my fingers crossed that the department will ask me back again. I've never had more fun teaching or taught a better class, ever. Good people, those future librarians of America.

I've moved all my domains from Network Solutions, so for the first time in 6 years I won't be sending them any money this year. I've got them all with EV1Servers now, the same place that holds the LISNews and LISHost servers (yes, they're now 2 separate servers). I've gone from paying $35 a year to $7 a year for the same exact thing.

Speaking of LISHost, there's now 101 domains on that box. I think we're quickly approaching our limit. I'm going to move the busiest sites off the .org box and back onto the .com box to even things out a bit. A couple domains are getting hit so hard with spam it actually slows the entire server down. I need to implement some more aggressive firewalling apparently. I'm not sure what to do with this set up in the coming year. I'm just barely over breaking even now, adding a third server would add addition headaches, and an extra couple hundred bucks a month, but it would give us a lot more room for growth. Is that something we want? I certainly need more help if we do much more growing.

I was working on writing some more on LISNews stats and we're averaging 8300 sessions, 31000 pageviews and 88000 hits so far this month. I have no idea what's going on, I don't see anything in the logs that's really abnormal, other than we're suddenly averaging an extra 1,000 sessions, 5000 pages, and 17000 hits a day, when compared with last month.

Thunderbird has lost me a lot of email. I can't recommend anyone use it at this point, I can't believe it was released as 1.0 with such a serious bug in it. ggrrrrrr

LISFeeds is now 2.0 [Beta] version. I've rewritten the back end completely, and added several dozen more feeds, I think we're over 100 now. There's still a few bugs floating around in there, but it gets the job done. Of particular interest to readers of this, there is a feed for LISNews journals. LISFeeds isn't reading it correctly at the moment, so I'm not sure if the trouble is with the feed, or my code on LISFeeds.

The LISNews Numbers for November

Eek, 'tis December already? 2004 came and went like a stick
of butter in a hot oven. So let's have a look at the stats for November. Most
numbers were up again for November. I keep thinking this month will be the last
month of any growth, but I'm proven wrong again and again. Urchin had has this
to say of last month:


  Total Sessions 206,115.00  

  Total Pageviews 788,790.00  

  Total Hits 2,130,467.00   

  Total Bytes Transferred 21.19 GB  


  Average Sessions Per Day 6,870.50  

  Average Pageviews Per Day 26,293.00  

  Average Hits Per Day 71,015.57  

  Average Bytes Transferred Per Day 723.15 MB  


  Average Pageviews Per Session 3.83  

  Average Hits Per Session 10.34  

  Average Bytes Per Session 107.78 KB  

  Average Length of Session 00:10:25

   Unique IPs: 37,878


Most popular Journals: nbruce, Daniel, shoe, slashgirl,
birdie, djfiander, Blake


Most popular Journal FEEDS: Walt, slashgirl, nbruce, Blake,
Samantha, shoe, Ashtabulaguy


We had visitors from 123 countries, The US, Canada,
Australia, the UK, and Switzerland being the most popular.


Firefox makes up almost 6% of all sessions. That's about 6%
vs. around 23% for IE, but taking a look back a month, Firefox didn't even show
up in the browsers list, and IE was almost 25% of sessions, so is that a HUGE
jump for Firefox in just one month? I'm not sure. I didn't update Urchin, so
either there was so few people using Firefox in October it didn't register, so
Urchin just discovered it as a browser. In any case, IE is defiantly trending
downward this year.


I wish there was a way to just grep out the weekdays, but
November was the first month ever with an average of over 7k sessions a day.
Most, if not all, weekdays saw well over 7k sessions.


The database numbers looked like this (keep in mind I think
I have a couple bugs in my numbers here):


Busiest authors:

Blake : 91

Rochelle : 80

birdie : 48

bentley : 11

Daniel : 9

Bibliofuture : 8

John : 6

Samantha : 3

Louise : 3

Bill Drew : 2

Karen K : 1

Aaron : 1

Ryan : 1

rudimyers : 1

Authors hits per story:

John : 811.83

Ryan : 515.00

Bill Drew : 402.00

bentley : 334.55

Rochelle : 305.70

Aaron : 288.00

Blake : 287.11

Bibliofuture : 275.25

birdie : 274.33

Daniel : 268.78

Louise : 267.00

Karen K : 240.00

rudimyers : 195.00

Samantha : 153.00

Total number of comments: 1017

Total number of commentors: 95

Busiest commentors:

nbruce : 114

GregS* : 94

twistedlibrarian : 72

Rochelle : 61

birdie : 56

ChuckB : 50

Daniel : 46

Fang-Face : 41

tomeboy : 41

mdoneil : 35

Total number of stories: 265

Total number of submissions: 263

Total number of metamoderations: 1072

Total Fair: 1039

Total Unfair: 33

Total number of moderations: 549

--Total up: 474

--Total down: 24

--Funny: 28

Average score of moderated comments: 1.7459

Comments with a score of 5: 12

Comments with a score of 4: 28

Comments with a score of 3: 71

Comments with a score of 2: 177

Comments with a score of 1: 710

Comments with a score of less than 0: 3

Number of journal entries: 224

Number of journal comments: 218

Most prolific journalors:

51 : nbruce

16 : slashgirl

16 : twistedlibrarian

13 : Daniel

13 : Durst

12 : ChuckB

11 : Rochelle

10 : AshtabulaGuy

10 : mdoneil

8 : Blake

Number of people who logged in: 239

Total number of user accounts: 3606

Some Other LISNews Numbers

Last time I wrote on stats I covered the Web stats,
statistics we can grep out of the Apache log file generated each time a page s
served. This time I thought I'd share more of the numbers gathered by Slashcode.
For the really curious, you can read through every line of Slashcode yourself
over on Sourceforge, but I’ll just pull out some of the more interesting numbers
for now. I've actually built a dynamic page that pulls out these numbers from
the database in real time, and it's quite easy to add in new queries, if you
have any ideas let me know. The trouble is, at the moment, it's a HUGE hit to
the database, and it's not something I can afford to have being loaded several
thousand times a day. I'll build a static version and post a link as soon as I
have some time. Class is almost over so hopefully I'll have some more time to
devote to the LISNews code.

There are several dozen Slashcode tables, but we only need
to worry about a few of them at this point. Users, Stories, Moderatorlog,
metamodlog, comments, discussions and journals gather the numbers we're looking
for. I'll  skip the lesson on relational database theory, so it's up to you to
make sense of anything that reads like code.

Note; These numbers are for the month of October 2004, I'll
post this month when we're reached December.

Busiest authors:
Rochelle : 94
Blake : 85
birdie : 56
bentley : 31
Daniel : 10
Bibliofuture : 6
rudimyers : 5
Dan G. : 5
Ryan : 4
John : 3
Samantha : 1
Dennie : 1
Karen K : 1
Bill Drew : 1

That one is pretty simple. I just count the number of
articles per userid and match the id against the Users table to figure out who's
who. Rochelle was a busy bee indeed during October! If I remember right, she
also hit her 1,000th story somewhere in there. This is one of those
areas where I'm not sure we could ever have a list that was too long. Breadth
and depth are vitally important, and they're something we do fairly well, though
maybe not as well as I'd like.  Having 14 authors post in a month is fantastic,
something I could only dream of a year or two ago. There's a number of ways we
can do better that I am considering.

Total number of comments: 1051
Total number of commentors: 101
Busiest commentors:
GregS* : 126
nbruce : 84
birdie : 80
ChuckB : 50
tomeboy : 48
Rochelle : 42
Fang-Face : 41
mdoneil : 40
Daniel : 39
Bibliofuture : 38

Another rather simple one. A few different count() calls run on the comments
table, with a join onto users to pull out the names. This is another area that
could use some more people, though unlike the number of authors, this one is
completely out of my hands. The 10 busiest commentators account for more than
half of all comments posted. I don't know that this is really all that different
at any other site that allows the kinds of collaboration we do. It would be
great if we had double or triple these numbers, but this is another number that
I couldn't have ever imagined reaching.

Total number of stories: 303

select count(*) from stories where date like thismonth. Nothing to it. What's
the magic number? How much is too much? No idea. I suspect more than one an hour
is too much, but for some people it's probably not enough.

Total number of submissions: 259

select count(*) from submissions where date like thismonth.
I know for sure this number could never be high enough. Submissions are the #1
most important part of the site.

Total number of metamoderations: 1114
Total Fair: 1074
Total Unfair: 40

The first one is just another count(), the next 2 limit to
either 1 (fair) or -1 (unfair) I'm actually surprised this number is so high.
I'd be curious to see if there are just a couple people participating, or we've
got a large group metamoderating.

Total number of moderations: 780
--Total up: 571
--Total down: 47
--Funny: 66

Why did I chose to single out the "funny" moderation?
I'm actually surprised this number is lower than the
metamoderation number. I suspect it has something to do with the ability to post
and moderate in the same discussion, or lack thereof, or, there's something
wrong with my queries.

Average score of moderated comments: 1.7505
Comments with a score of 5: 22
Comments with a score of 4: 47
Comments with a score of 3: 89
Comments with a score of 2: 195
Comments with a score of 1: 666
Comments with a score of less than 0: 5

Just A bunch of counts(). Moderations, like
metamoderations, or overwhelmingly positive. Most comments start out with a
score of 1, many with a score of 2, so anything above that has been moded up at
least once. Anything below 0 has been moded down at least twice. I've always
like the ability to just view the 5s at Slashdot, though it's not a problem here
since most stories get very few comments. Moderation was the big reason I chose
to move to Slashcode. The interaction between moderation and metamoderation is
pretty darn interesting, and I would like to build a couple pages to have a look
at how that's all working.

Number of journal entries: 193
Number of journal comments: 185
Most prolific journalists:
30 : nbruce
14 : Daniel
13 : birdie
11 : mdoneil
11 : bentley
11 : Fang-Face
10 : AshtabulaGuy
9 : slashgirl
9 : ChuckB
9 : Walt

Journal comments and regular comments are stored in
different tables for some reason, so they're easy to track seperatly. The
journals are jumpin' this month, and have been attracting far more comments than
the main stories. My first project when I have some time to devote to the code
is going to be to set up a Journals section, probably at,
that builds an index page based on journal entries. That will give everyone the
ability to essentially post to the index page of a section.

Number of people who logged in: 192
Total number of user accounts: 3574

There is no field for account added date (now that I write
this, there should be) so I can't tell for sure when a new account was added. I
suppose I could start keeping track of that, or even figure it out by looking at
when someone left a comment for the first time. The larger the number of people
logged in, the more participation, the better LISNews becomes.

The numbers that interest me are the ones that show not
just total usage, but also show us just how many people are doing things. I'd
like to see more people participating. So if we have 1000 comments, but they're
from 10 different people, we're not really getting the kind of breadth in
participation I'd really like to see. My philosophy has always been there's
strength in numbers and the more people participating the better & stronger
LISNews becomes. We should never become an echo chamber. As many sides as
possible should be, as the kids say,  "representin' yo."

One thing I know I've missed is the Zoo, aka "friends and
foes" I'm actually not quite sure what I can pull out of that table that would
be of any interest. I'm trying to maintain privacy on any topics that might
cause any hard feelings, and I'm afraid that might be one of them.

What's interesting is this month there are actually more
comments on the journals than the stories. I think this is a first, and it'll be
interesting to see if it's the start of a trend. It'll be very easy for me to
add new numbers to this list, so please so let me know if I missed something, or
you have an idea for a new query. I'll also cut and paste the numbers for this
month once they're all tabulated.

A Longer Look At The LISNews Numbers

Every month I post some basic statistics about LISNews. I thought I'd write something up that explains what those numbers mean, and where they come from to help everyone understand not just the LISNews numbers, but web stat numbers in general.

Here's a well known secret about web stats: They're less than perfect. While they may be less than perfect, they're all we have to go one. Without using some of the more invasive methods like cookie pushing, we just can't be quite sure who's visiting our site, where they came from, what they did while here, and why they left. These are the things that drive web masters crazy. To really be able to design a useable and user friendly site it helps to know why people come to your site, how they found it, what they used it for, and why they left. These are hard questions to answer when all you have is a line of text that looks like this:

888.665.555.1212 - - [07/Nov/2004:05:32:14 -0500] "GET /images/layout/slc.gif HTTP
/1.1" 200 139 "" "M
ozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

They say "write what you know" so I'll be focusing on LISNews, and the stats package we use at LISHost, Urchin. First let's quickly look at where most websites normally get their numbers from. Most server log files look something like this:

888.665.555.1212 - - [07/Nov/2004:05:32:14 -0500] "GET /images/layout/slc.gif HTTP
/1.1" 200 139 "" "M
ozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

From this we can learn a few things. The IP address of our visitor. This will often lead us back to their domain as well. So we can often figure out where in the world that computer was located. The date and time this hit was recorded, the type of request, protocol used, the server message returned, the referring page, and finally the type of browser being used to make the request. This, all on a single line. Each time a file is requested from the server a similar line is written to the log file. To all but the most dedicated administrators these lines are used by programs that work through them and present numbers in a more manageable and readable format. Before I talk about stats packages I'll break down a log file line and cover what we learn from each bit.

Basically we have a mini text database of all requests served by our web server program, Apache. Each entry in our database is separated by either a space, dash or a line break. Let's take a look at our example. Each bit of information we can use is separated by a space:

888.665.995.123: This is the users IP address (in reality this is an IP address I made up). It tells us where in the world this request was made from, that is, where we sent this file. Using a DNS server it is often possible to look up the requesting computer's domain name as well.

[07/Nov/2004:05:32:14 -0500]: Next we have the date.

"GET /images/layout/slc.gif HTTP/1.1" Now something not as easy read. GET, followed by /images/layout/slc.gif followed by HTTP/1.1. This tells us the server returned a GET request to a browser, and the name of the file, followed by the protocol it used to make the transfer.

200 139: Now a couple a seemingly random number popup to confuse us even more. Servers have a standard set of numbers they use to tell each other how things are going. A 404 means the file is missing, 200 means all is well. There's several others that are possible as wel.

"" A rather standard looking URL. This is the referral, that page that was used to find whatever page was just served.

"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" And last, but not least, the browser. This is a good place to learn how many search engines are hitting your site. Often you'll see Googlebot as a type of browser.

So stack a few million similar lines on top of each other, one for each hit, and you've got your self an exact representation of what's going on with your site. Better yet, sit there and just watch the hits roll in by using a few command line programs (tail -f or perhaps tail -f | grep -V .gif) to get a feel for what's happening in real time. While it's not fair to say these numbers are misleading, it may be more accurate to say these numbers can be misleading if left unanalyzed.

Remember, each line represents any request made to the web server for any reason. An impatient reader could have reloaded a single page 6 times trying to make it move along quicker, an out of control bot could've gotten stuck in an endless loop and loaded the same 2 pages 10,000 times. Stranger things have happened, and we've seen many of them here @LISNews. So to really get a feel for how many people are visiting the site in any given day it takes a combination of human powered brains, and computer powered muscles to get an accurate count of your visitors.

Urchin is the computer power behind most of the numbers I share for LISNews each month. I thought it might be interesting to first look at Urchin, and how it does things, and then follow that up with a comparison to a popular (free) stats package to see how the two systems crunch the same numbers. Remember, all stats packages do things a bit different so your site may report numbers in a completely different way depending on how things are recorded and measured.

So let's have a look at Urchin. Urchin is expensive, powerful, pretty and quite darn nice, if you ask me. You can take a tour at Urchin to see what it looks like, we're current running version 5.7. Urchin is probably one of the more expensive log analyzers out there, but it's also one of the nicest. One of the nicest features of Urchin is the SVG graphs. Scalable Vector Graphics (SVG) allow some nice interaction with the graphs generated by the system. SVG currently has limited support in Firefox, so you may find yourself firing up IE to look at some of the more interesting charts. Some of the more important things to consider with any stats package is how it defines certain terms that are of central importance when digging through your stats. Pageviews, hits, sessions, visitors. These are the important numbers, and these are also numbers that are open to some interpretation. This is how Urchin, and therefore LISNews, defines some numbers that I report each month.

Hit - A hit is simply any request to the web server for any type of file. This can be an HTML page, an image (jpeg, gif, png, etc.), a sound clip, a cgi script, and many other file types. An HTML page can account for several hits: the page itself, each image on the page, and any embedded sound or video clips. Therefore, the number of hits a website receives is not a valid popularity gauge, but rather is an indication of server use and loading.

Pageview - A page is defined as any file or content delivered by a web server that would generally be considered a web document. This includes HTML pages (.html, .htm, .shtml), script-generated pages (.cgi, .asp, .cfm, etc.), and plain-text pages. It also includes sound files (.wav, .aiff, etc.), video files (.mov, etc.), and other non-document files. Only image files (.jpeg, .gif, .png), javascript (.js) and style sheets (.css) are excluded from this definition. Each time a file defined as a page is served, a pageview is registered by Urchin.

Page - Also known as a web page, a page is defined as a single file delivered by a web server that contains HTML or similar content. Any file that is not specifically a GIF, JPEG, PING, JS (javascript), or CSS (style sheet) is considered a page.

Session - A Session is a defined quantity of visitor interaction with a website. The definition will vary depending on how Visitors are tracked. Some common visitor tracking methods and corresponding Session definitions:
• IP-based Visitor Tracking: A Session is a series of hits from one visitor (as defined by the visitor's IP address) wherein no two hits are separated by more than 30 minutes. If there is a gap of 30 minutes or more from this visitor, an additional Session is counted.
• IP+User Agent Visitor Tracking: A Session is a series of hits from one visitor (as defined by the visitor's IP address and user-agent, such as Netscape 4.72) wherein no two hits are separated by more than 30 minutes. If there is a gap of 30 minutes or more from this visitor, an additional Session is counted.
• Unique Visitor Tracking (cookie-based, such as Urchin's UTM): A Session is a period of interaction between a visitor's browser and a particular website, ending upon the closure of the browser window or shut down of the browser program.

Visitor - A Visitor is a construct designed to come as close as possible to defining the number of actual, distinct people who visited a website. There is of course no way to know if two people are sharing a computer from the website's perspective, but a good visitor-tracking system can come close to the actual number. The most accurate visitor-tracking systems generally employ cookies to maintain tallies of distinct visitors.

So for example, this month (Nov 2004) Urchin says the following:
Total Sessions 42,961.00
Total Pageviews 143,518.00
Total Hits 412,331.00

Does that mean we really had almost 43,000 of visitors? No. Does it mean we came close to that? Well, that depends on what "close" means. Having 43,000 sessions in a month means that somewhere in the neighborhood of maybe 30,000 folks dropped in for a visit. If your sister in law stopped over at your house 6 times last month, you could say she was responsible for 6 sessions at your house. You didn't have 6 different visitors, but rather one person came over 6 times and did something to annoy you (or maybe that's just me). One question that has always nagged at me was just how many people, unique & different people view a page using human eyes on any give day @LISNews. This number is almost impossible to answer. I'll actually go so far as to say it IS impossible to accurately answer, we can only make an educated guess.

To get a more accurate count of actual human visitors to the site we'd first need to use IP Addresses as the most accurate gauge of uniqueness. So far this month Urchin says we've had 9,681 unique IP addresses. The top 10 addresses were responsible for about 18% of all sessions, that's about 7700 out of about 43,000. This is still an imperfect measurement thanks to caching servers, DHCP and firewalls, but it's probably the best we can do. Then, we'd need to subtract out as many bots, search engines, fee readers, and anything else we can identify as non-human. Then we'd be left with a more accurate measurement of people who viewed at least a page @LISNews. This month, after a considerable amount of addition and subtraction, I've made an educated guess that approximately 25% of IP addresses that access LISNews are some sort of automated computerized non-humanoid thingy.

My guess would be that net ratings companies like MediMetrix and Nielsen have similar, if not worse, issues with reliability and consistency. The web was not designed to do much of what we use it for today. Keeping accurate counts of visitors is just one more example of what how we have improvised any number of hacks to improve and improvise our way to a better system.

The standard Apache web logs do an excellent job of telling us about what pages and files were most popular. LISNews has served almost 143,000 pages so far this month. The most popular being:
Page Number %
1. / 20,087 14.00%
2. /lisnews.rss 14,295 9.96%
3. / 8,863 6.18%
4. /rss/descriptions.rss 5,315 3.70%
5. / 4,659 3.25%
6. /index.rss 3,847 2.68%
7. / 3,457 2.41%
8. /rss/popular.rss 3,175 2.21%
9. /robots.txt 3,020 2.10%
10. /article.php3 2,722 1.90%
View Total: 69,440 48.38%

So the top 10 pages were responsible for almost half of all pages served. Another strength of the Slashcode is it's ability to store a large amount of other data in the Slash database. There's an entire second set of more accurate numbers that I'll write up some day. Things like who posts the most comments, how many people have logged in this month, and how many comments were moderated up to 5.

I'll write up some comparisons between Urchin and some other stats packages, and really crunch some numbers at a later date.


Subscribe to RSS - Blake's blog