| |
Tutorial: Binary Files From Usenet
What are all these different files? What do I do with
them? Why is this so complicated?
Split files
.001, .002, .003, etc. files Archives
.zip files and other similar archives
.rar files (and .rxx) Parity correction files
(often referred to as Parchive files)
.par files (and .pxx)
.par2 files CD Images
.bin/.cue and other CD image files Video Files
.dat files and .vob files Extra files
you'll often see
.nfo and .diz files
.nzb files
When you want to download binary files, you might think
you are just going to get .mp3 files for sounds, .jpg and
etc. for pictures, .mpg, .avi, .wmv, etc. for video files,
and so on - but in order to manage large posts, almost anyone
who posts files will archive the files into sets of files,
all nearly the same size, that contain the file or files
that you want. So downloading the binary files becomes a
multi-step process. Your newsreader software will hopefully
do some of the heavy lifting for you, downloading individual
encoded posts and combining them into the target binary
files - but there are still some steps that usually have
to be taken before you are done. First
step - know what you are looking at! In Windows
95 and later Windows versions, it became fashionable to
hide files and extensions from the user of the computer,
so once you download files, it's difficult to tell sometimes
exactly what kind of file you are looking at. You really
want to be able to see and work with the file extensions
- so if you are using Windows, perform the following steps
if you have not already. First, choose My Computer >>
Right-Click and choose "Explore". In the Windows Explorer,
choose Tools >> Folder Options >> click on the
"View" tab. Deselect "Hide File Extensions for known types".
Click on the "Like Current Folder" button to make this setting
global. Now you will be able to see and edit the extensions
of the files you download.
Tools>>Folder Options>>View
Dialog
In many newsgroups, the files are simply posted in their
original format (.mp3, etc.), and all you have to do is
download and enjoy them. Just double-click on the file and
it will open in Windows Media Player or whatever application
you use for various media. As you start looking at more
newsgroups with different kinds of files posted to them,
you'll start seeing files that are in strange formats and
you may wonder what to do with them. Some files are archives
or file containers that actually include what you are looking
for. Others are files that you can load into various utilities
to make sure that the binaries you want are complete and
have not been corrupted or damaged in transit - and in some
cases, even repair the damaged or missing files. Others
are files that describe the binaries being uploaded, or
even describe the upload in such a way that your newsreader
can load one file and have a "recipe" for downloading the
entire set of posted files. Here is a discussion of many
of the most common files you'll encounter in the binaries
newsgroups on Usenet, and some tips and tricks for dealing
with them, and where possible, some links to utilities and
helper programs that will assist you. Split
files
.001, .002, .003, etc. files
These are simply large files that have been
split into pieces for easier handling during upload and
download. Often they are very large MPEG or other files
that are already in a highly compressed form, so that further
compression isn't needed. To recombine them, the binary
files just need to be "added" back together in the order
that the split parts were taken from the original file.
This can sometimes be done with a simple DOS command or
batch file:
copy /b file.mpg.001+file.mpg.002+file.mpg.003 file.mpg Note that this method uses "copy /b" for
binary copy, not simply "copy". Now, imagine
that you have a file split into 115 parts - that would
be a lot of typing. Much better to use a utility made
for the purpose! Two of the most popular utilities are
HJSplit (http://www.freebyte.com/hjsplit)
and Mastersplitter (http://www.tomasoft.com).
These utilities will allow you to rejoin split files like
this, and to split files if you need to. They also include
many features that will help make sure that you are not
missing any parts and that the parts go together in the
right order.

HJSplit Utility showing a set of files selected
to be joined. Just select the ".001" file and the rest
will be located automatically if they are in the same
directory.
.zip files and other similar archives
This is a common kind of archive format that most people
are familiar with. In Usenet newsgroups, you will often
see sets of small files bundled together into .zip archives
and uploaded then as a single file. Often programs and
utilities that are downloaded from websites will also
be archived together as a .zip file. The .zip files not
only collect a number of files together for easier downloading,
they also compress the file data so that it takes less
time to download. Other similar but less used formats
are .arc files (an older DOS type with versions by Thom
Henderson's SEA and Phil Katz' PKWare - as a side note,
a fascinating piece of hacker/net history from the BBS
days is the story of the ARC format, the fight between
Henderson and Katz, the subsequent birth of ZIP, and Katz'
tragic death in 2000 - I met Katz at COMDEX in '86 or
'87 and found him to be an insanely intellegent guy. Just
Google
"Phil Katz" for many interesting articles from both
sides of the argument.), .gz and .tar files (common UNIX
types), .arj files (an alternative to ZIP with file splitting,
encrypting, and advanced validity checking), .lzh files
(a simple public domain compression), .sit (Macintosh
Stuffit), and .cab files (usually used for installation
packages). There are a number of utilities that can open
and extract files from these various kinds of archives.
Winzip is a utility that runs under most versions of Windows,
and can extract from all of the formats listed above except
.sit (Stuffit) files. It is available as a try-before-you-buy
download at http://www.winzip.com.
The originator of the ZIP format is PKWare (Founded by
Katz mentioned above), and their utilities can be found
at http://www.pkware.com.
For Linux and Mac users, Zip utilities can be found at
http://www.info-zip.org.
At ArjSoft's website, http://www.arjsoft.com you'll
find utilities for .arj files for use with DOS/Windows
environments, and Aladdin Software (http://www.aladdinsys.com)
makes utilities for Macintosh, Unix, Solaris, and Windows
that handle not only .sit archive files, but .arj, .zip,
and many other formats as well.
.rar files (and .rxx)
I'm discussing .rar files separately because by far, it
is the most widely used format on Usenet for distributing
binaries. There are a number of fine points that I'd like
to talk about in managing RAR files as well, not that
there are more complications or difficulties with the
RAR format than any other, but it is so widely used and
you'll see so many versions and techniques used to create
the RAR files that you may see more of the associated
problems and it is a good file type to know well. The
best software to use for creating, combining, and uncompressing
RAR files is WinRAR. It is available for Windows, Macintosh,
UNIX, Linux, and PocketPC at Rarlab - you can download
try-before-you-buy versions at their website. (http://www.rarlab.com).
There are two kinds of RAR file sets that you'll commonly
run into. RAR file sets created with an older version
of WinRAR (before version 3.0) have filenames that will
look somewhat like this: filename.r00, filename.r01, filename.r02,
etc., with the last file in the set named filename.rar.
Newer versions of WinRAR (version 3.0 and above) create
file sets with names that look like this: filename.part01.rar,
filename.part02.rar, filename.part03.rar, etc., counting
up to the total number of files in the set. Newer versions
of WinRAR can create the "old" style filenames, but usually
this isn't done.
The most important thing to remember when using WinRAR
is to make sure that you always have the latest version
if you are downloading RAR files. Most of the problems
that people have with files being corrupt or unusable
come from attempting to open a set of RAR files in a version
of WinRAR that is older than the version used to create
the RAR files. This is also one of the first things to
check (after running a PAR or SFV validation as outlined
below) if you attempt to open a set of RAR files and get
errors saying that the compression isn't valid or other
errors of that type.
WinRAR will also give you warnings and errors if any of
the parts are corrupt or missing from having been uploaded
or downloaded improperly. Sometimes the file errors are
negligible, and you can actually still use the files -
by checking "keep broken files" in the UnRAR options,
you can save the file even though it's damaged, and see
if it's still usable.
If the RAR files are loaded out of order - say you doubleclicked
on the ".r04" file instead of the ".rar" file, or the
"part04.rar" file rather than the "part01.rar" file -
it's possible to accidentally unRAR only part of the files
in the archive. Sometimes in this case you will be prompted
with options including "extract all files and folders
from the current" - always click on that option in that
case, but it's safest to simply make a habit of opening
the RAR file set by clicking on the ".rar" or "part01.rar"
file to make sure the set is completely loaded.
Once the file set is open, the dialog will show all the
files in the archive. Select them all, right-click them,
and choose "Extract to the specified folder", and choose
the folder that you want to save the file(s) to. If you
type in a folder name that does not exist, it will be
created.

WinRAR with a set of RAR files loaded, ready
to extract. This is the same set of files used in the
PAR and PAR2 examples below.
But what if the archive files are corrupt or incomplete?
Often, almost always with experienced and considerate
posters, there will be, along with the posted files some
recovery files. The first thing to do is to download the
files or archives themselves. If there is a file with
the extension ".sfv" in the post, you can download it
to test the download to see if it's complete and undamaged.
To use these files, get one of several utilities to read
them including cSFV (freeware) at xxx, or QuickSFV from
xxx. The SFV files are quite small, so they are easy and
fast to download. Just open the SFV file, and the program
will test the downloaded files and notify you of any that
are missing or damaged.
If any of the files are missing or damaged, they can be
repaired using any PAR or PAR2 files that were included
with the post. The way that these files work uses the
same technology as recoverable disk drive volumes using
"Parity" records (hence the "par" designation).
Parity correction files (often referred
to as Parchive files)
.par files (and .pxx)
This type of PAR set based on the PAR1
specification consists of on file with an extension ".par"
which is a small text file used to check the files and
define the set in much the same way that an SFV file is
used, and a number of files with the extensions .p01,
.p02, .p03, etc. (I'll call these collectively pxx files)
which are the actual parity data files. Each of the pxx
files is just a bit larger than the largest file in the
set of files to be repaired. To use these files, you will
need one pxx file for each file that you need to replace
or repair. Open the par file with any of several utilities
- FSRaid, SmartPAR, or QuickPAR. Sourceforge (http://parchive.sourceforge.net)
has both technical information and links to download all
of these clients. My preference in PAR1 software is Fluid
Studios' FSRaid.
Set of downloaded RAR files with PAR files
for repair. Note that "testfile.part04.rar" is missing!
Once the files are checked, the software will allow you
to perform the repairs. The repaired files are completely
compatible with the originals, there is no loss of data
or quality by repairing using this method. The method
I usually use is to download just the ".par" file and
the files I want, then use FSRaid to test the files. If
any repairs are necessary, it will tell you how many pxx
files you will need - go back to the newsgroup and download
just the number you need, if any, and you don't waste
download megabytes getting unnecessary files.
Once you've loaded the .par file in FSRaid,
if there are enough pxx files available, it will automatically
perform the repair. Excellent!
.par2 files
The PAR2 specification has several advantages. The PAR2,
instead of requiring a complete replacement file to repair
an incomplete source file, divides the entire post into
a set of smaller blocks. So fewer files are required to
effect repairs - although for missing files you will need
the same amount of PAR2 files in megabytes as the missing
files. If your news reader allows you to download and
combine incomplete files for which all the posts aren't
available, downloading as much of them as you can, incomplete
if necessary, will allow you to repair the posts with
PAR2 files with a relatively small number of the PAR2
files compared to the amount of PAR files you would need
to download to do the same thing.
The same set of RAR files, put with PAR2
files for repair this time. In this example, "testfile.part04.rar"
was downloaded incompletely - note the file size.
To use PAR2 files, you will need to use software that
can work with them such as QuickPar (see above). When
it checks the files you've downloaded, it will check each
file part by part rather than as a whole, and even partial
files can be used along with the blocks from the PAR2
files to create complete files. Where with "old" PAR files
if a files was incomplete or corrupt it might as well
be missing completely, with the PAR2 files you can use
partial or partially corrupted files in the repair. The
only disadvantage is that it's slower, and the PAR2 files
take longer to create, but the reduction in the amount
needed to download to repair damaged files makes up for
it.
After loading the PAR2 file with QuickPar,
the testing shows that the file was incomplete, and that
we have enough blocks to recover it.
After the repair is done, simply exit and
open the RAR archive as usual. Note that it kept a copy
of the damaged / incomplete file, with a ".1" extension
tacked on to the end. Usually these can simply be deleted,
but it may be useful in some rare cases. Once in a while
QuickPAR will repair the file, but it actually shouldn't
have been repaired - then WinRAR says that the repaired
file is corrupt! In that case, just delete the repaired
file and rename the ".1" copy back to ".rar", and run
WinRAR to see if that fixes it. This has only happenned
to me once or twice in the past year or two since PAR2
became widely used.
OK, you've downloaded the files, and unRARed them, now
what? Sometimes after completing a download you'll wind
up with files that you were not expecting - and don't
know what to do with. Different people post things in
a number of different ways, depending on what software
they commonly use and what is convenient for them. Always
remember the three rules of Usenet binary newsgroups -
1) No One Owes You Anything.
2) There Ain't No Such Thing As A Free Lunch.
3) See The First Rule.
What I'm trying to get at is that if someone
posts something you want, but it's in a format you are
not familiar with or that's not your first choice, it's
up to you to get the utilities or tools to work with their
files, not the other way around. If you try to persuade
the posters in a newsgroup to change the way they do things
to make it more convenient for you, the likely result
is that no one will pay any attention, and many will simply
"killfilter", or block, any messages you post to the group
as annoyances. Here are some tips and pointers that might
help you work with files that are puzzling you.
CD Images
.bin and .cue and other image
files
When you unpack the RAR files all you
get is a great big .bin file and a little tiny .cue file!
What's wrong? BIN/CUE files (they are a set, one of each
to a CD) are CD Image files, they most commonly used CD
Image format on Usenet. Rather than upload the contents
of a CD separately, as a bunch of files, the poster has
uploaded an image for burning a complete CD that exactly
duplicates a CD they've made or obtained. Other formats
for CD images include DAO (Duplicator), TAO (Duplicator),
ISO (Nero, BlindRead, Easy CD Creator, many others), IMG
(CloneCD), CCD (CloneCD) , CIF (Easy CD Creator), NRG
(Nero), C2D (WinOnCD), CDI (DiscJuggler), PXI (PlexTools),
MDS (Alcohol 120%), MDF (Alcohol 120%), VC4 (Virtual CD),
BWT (BlindWrite). There are several reasons why a poster
might choose to upload in this format rather than simply
uploading the files. In some cases the CD has special
attributes, such as being bootable, that are preserved
in this kind of upload. The CD may be a Video CD (VCD
or SVCD) or a DVD that the author wants to upload with
the menus and chapters setup intact rather than just uploading
MPEG files. Or it may simply be a matter of convenience
for the poster to manage files in this way. They are the
poster, so it's up to them to decide, even if it's a little
inconvenient for you. No worries though, you can deal
with these files if you want with a few simple utilities.
A BIN/CUE fileset - The BIN file contains
the data, and the CUE file is a description for how a
CD burner program should handle it.
You have a couple of options with files of this type.
One is simply to go ahead and burn the CD, the other is
to "Open" the image file, and "Extract" the files from
it, like copying files off a real CD.
In order to burn the image to a CD you may need just the
right software for the image type. Many of the common
formats, ISO for example, are supported by many CD burning
programs, but some are not. Here are some guidelines:
For "ISO" images you can use several programs. This is
an open-standards format, like the ISO-9660 file format
that's used to burn the CDs, so it's built into a lot
of CD burning software. For others, you may have to use
the software that created it, or open the file and extract
the contents as outlined below. Easy CD Creator (http://www.roxio.com)
can work with ISO and CIF files. Nero Burning ROM (http://www.nero.com)
can work with several, including ISO, BIN/CUE, NRG, and
others. Alcohol 120% (http://www.alcohol-software.com/)
can work with ISO, MDS, MDF, and others. A neat little
utility that I often use is Blindwrite (http://www.blindwrite.com),
which can burn CDs from ISO, BIN/CUE, and BWT files.
Select the CUE file with Blindwrite - it
will load the BIN file automatically
Insert a disc, and Blindwrite will make you
an exact copy of the original CD
If you simply want to open the CD image file and
look at the files or copy them to your hard disk, you
can use a utility such as Isobuster (http://www.smart-projects.net/isobuster).
It allows you to open many different image file formats
including all those listed above. Simply open the ISO
file and you will be presented with the directory structure
of the CD as if it were open in Windows Explorer - you
can copy files and directories from the CD image to your
hard drive.
Open a BIN file with ISOBuster - it does
not need the CUE file. Right-click on selected files to
get extract options.
Video Files
.dat files and .vob files
If the CD image is a Video-CD (VCD or SVCD) or DVD, it
may not be apparent how to get at the actual video files.
If you just want an MPEG file that you can view on your
computer or convert to a different format, and not burn
to a disc, you'll need to extract the files containing
the video, and you may need to "convert them a little"
before they are useful. On a VCD/SVCD the actual video
will be in MPEG-1 (VCD) or MPEG-2 (SVCD) format, and will
be in the "MPEGAV" directory, named something like "AVSEQ01.DAT",
or "AVSEQ01.MPG". there may be several files of this type,
i.e. AVSEQ01.DAT, AVSEQ01.DAT, AVSEQ01.DAT, etc. - each
of the files being a different segment of the disc's video.
Some may simply be intros, credits, etc., and some (usually
the largest files) will be the content you are interested
in. There may be other directories on the disc as well,
depending on how it was authored, such as "EXT" or "SEGMENT"
that may contain content - look in all of the folders
for files named "*.DAT" or "*.MPG" to get all the files
that may be content you want. Usually, to watch these
files and see what's in them, if they are named *.MPG
you can simply double click them. If they are named *.DAT
you can rename them, changing the DAT extension to the
MPG extension, and you can doubleclick and play them in
Windows Media Player or whatever you have set as the default
for MPEG files.
Sometimes, however, this doesn't work quite that simply.
It usually will, but in the process of creating the VideoCD,
some software can change the MPEG file's format slightly
when writing the DAT file (especially if the DAT file
was made up of more than one MPEG file joined together
in the VCD creation process) - in this case you'll need
to repair or re-rip the DAT file. There are several utilities
that can help. One thing to try is extracting the DAT
file from the VCD image with ISOBuster using the "Extract
but FILTER only M2F2 mpeg frames" option. (Right-click
on the filename, and choose that option instead of simply
"Extract") This will convert any frames in the MPEG file
that are causing errors and will simply copy the rest
of the file normally. Visually you may see some blurring
when playing the problem frames during playback but it's
generally hardly noticeable. Another utility to try is
VCDGear (http://www.vcdgear.com
- freeware, but deserving a donation if you use it!) -
load the DAT file and choose the "Fix MPEG Errors" Option
and save it to MPEG. If the DAT file contains multiple
MPG files, each will be created separately. VCDGear can
also read MPG files directly out of several different
CD Image formats. Very useful.
Convert a DAT file to mpg file(s) with VCDGear
In the case of a DVD image, you will find VOB files (VOB
stands for Video OBject). These are a little different
format than MPG files you might be used to as they can
contain multiple titles and streams. The way to deal with
them is to use VCDGear after renaming the VOB file to
MPG, selecting "mpg --> mpg" and "Fix MPEG Errors".
It will create one or more mpg files based on the contents
of the VOB file. If the VOB file is encrypted (i.e. from
a commercial DVD), these methods won't work, and it is
currently illegal to distribute tools for removing that
copy protection.
Extra files you'll often see
.nfo and .diz files
These files used to have a set format and were used (in
ancient times) by BBSes to take uploaded files and put
them into the proper category (Games, Utilities, etc.)
and add their descriptions to the BBS download menus.
In the last several years though, they are mostly used
the same way any .txt files are used, although vestiges
of a common format are still apparent, and there are several
tools out there to generate the files from a form in a
consistent way. They can't be assumed to be "structured
data" anymore though. Simply open them with Notepad or
any text editor / viewer.
These are text files that describe the files in the download.
Often they contain information about how long the poster
intends to take to complete the upload, their preferences
for repost or fill requests, and other information. It
is always a good idea to read the .nfo file before downloading
a large number of files, to make sure that it's something
you want, first of all, and so that you will know how
the poster intends to go about uploading the files.
Starting with Windows XP, the ".nfo" file extension is
automatically associated in Windows with the "System Information"
application, which will simply give you an error that
they are invalid files. The chance that you will ever
actually need to save and open "System Information" files
is virtually nonexistant - it is perfectly acceptable
to right-click on one of these files, choose "Open with...",
and select Notepad as the application to use. If you select
"Always use this application to open files of this type",
it will re-assign the application association from "System
Information" to "Notepad", which will make your life easier.
In the unlikely event that you actually do need to open
a System Information file, you can simply right-click
on it and choose open with... and System Information,
so you are not losing anything important by doing this.
A lot of NFO, DIZ, TXT, etc. files included with files
uploaded have got a ton of little odd characters all over
them and may be hard to read. This is "Art". In order
to see them properly in Notepad, change the font to a
non-scaleable/proportional font (an un-kerned one) like
Courier. Some are quite pretty. This will make it much
easier to read the text buried in little ballons in the
"Art" as well.
.nzb files
This file format is a descriptor of the entire posted
set of files that you can load into many newsreaders (such
as Newsbin Pro) to select the posts for download without
having to find and select all the parts. It's a nice convenience,
but you should be careful to deselect the extra Pxx or
PAR2 files if you want to save download megabytes by not
downloading them unless needed.
I hope this has been helpful, or at least interesting!
--technogeek |
|