M$ DFS: The Love - Hate relationship

Dynamic File System is not Micro$oft's idea of a joke. In reality, it is their attempt at - oh how did they put it - "help simplify access to files and folders, system maintenance, help enhance availability and performance, and help lower total cost of ownership (TCO)". Curiously, let me break this down into the sum of it's parts:

  • help simplify access to files and folders
  • system maintenance
  • help enhance availability and performance
  • help lower total cost of ownership

Help simplify access to files and folders

This is true. DFS, by unifying different servers and their shares into one share, things just get easier. Rather than spew a whole bunch of Micro$oft propaganda, just watch this flash video.

System Maintenance

This could be true. I don't have DFS working for this purpose, so I don't really know (and honestly, don't really care).

Help enhance availability and performance

I would say that most of this true. By using File Replication Service service in Windows Server 2003, availability is increased by having the document information automagically replicated between different file servers. Performance is increased in the sense that, the down time of a replicated share for the end user is or at least should be, no time at all. Performance in the sense of network performance is decreased, because of the nesessity of replication: chewing up bandwidth between available servers every time a change is made.

Help lower the total cost of Ownership

This is pure BUNK. Should DFS, actually the underlying service FRS, fail systemwide - you have to rebuild it - from scratch. Funny part is, the event ID I went through, I cannot find a web based version anywhere on the Micro$oft website describing the recovery process. I did however, find a copy posted in a forum, and I will post it for all to read now:

The File Replication Service is in an error state. Files will not replicate
to or from one or all of the replica sets on this computer until the
following recovery steps are performed:

Recovery Steps:

[1] The error state may clear itself if you stop and restart the FRS
service. This can be done by performing the following in a command window:

net stop ntfrs
net start ntfrs

If this fails to clear up the problem then proceed as follows.

[2] For Active Directory Domain Controllers that DO NOT host any DFS
alternates or other replica sets with replication enabled:

If there is at least one other Domain Controller in this domain then restore
the "system state" of this DC from backup (using ntbackup or other
backup-restore utility) and make it non-authoritative.

If there are NO other Domain Controllers in this domain then restore the
"system state" of this DC from backup (using ntbackup or other backup-restore
utility) and choose the Advanced option which marks the sysvols as primary.

If there are other Domain Controllers in this domain but ALL of them have
this event log message then restore one of them as primary (data files from
primary will replicate everywhere) and the others as non-authoritative.

[3] For Active Directory Domain Controllers that host DFS alternates or
other replica sets with replication enabled:

(3-a) If the Dfs alternates on this DC do not have any other replication
partners then copy the data under that Dfs share to a safe location.
(3-b) If this server is the only Active Directory Domain Controller for
this domain then, before going to (3-c), make sure this server does not have
any inbound or outbound connections to other servers that were formerly
Domain Controllers for this domain but are now off the net (and will never be
coming back online) or have been fresh installed without being demoted. To
delete connections use the Sites and Services snapin and look for
Sites->NAME_OF_SITE->Servers->NAME_OF_SERVER->NTDS Settings->CONNECTIONS.
(3-c) Restore the "system state" of this DC from backup (using ntbackup or
other backup-restore utility) and make it non-authoritative.
(3-d) Copy the data from step (3-a) above to the original location after
the sysvol share is published.

[4] For other Windows servers:

(4-a) If any of the DFS alternates or other replica sets hosted by this
server do not have any other replication partners then copy the data under
its share or replica tree root to a safe location.
(4-b) net stop ntfrs
(4-c) rd /s /q c:\windows\ntfrs\jet
(4-d) net start ntfrs
(4-e) Copy the data from step (4-a) above to the original location after
the service has initialized (5 minutes is a safe waiting time).

Note: If this error message is in the eventlog of all the members of a
particular replica set then perform steps (4-a) and (4-e) above on only one
of the members.

What this is basically saying is that you need to move your data out of the existing share directory, because the directory is become empty if you choose to replicate that share again!!!.

This is what happened to me. I mean, who has time for this nonsense, when your users need access to that data NOW. Luckily Server 2003 moves all that data into a "pre-existing directory" within the shared folder, so all it takes is a copy back into the share and all is well.

What I don't understand, is why they couldn't build in some kind of crash-recovery feature - like a file merge similar to RSYNC. Is that really so much to ask?

Comments

Dear God

It happened again!! How many lost man hours and TCO does that ERROR add? GRRRRRRRRRRRR.

Syndicate content