| |
Usenet Binary Files - why is this so complicated?
Multipart and encoded messages containing binary
files
Usenet was originally designed for sharing text articles among computer networks, organized by subject. The standard for the text articles is that they would consist of 7-bit ASCII
characters, no more. This means that only the first 128 characters of the ASCII set could be used, the "English Printable Characters". Almost all systems will now allow 8-bit
characters. This allows use of characters with accents or other special characters for most european languages, but still does not allow the characters used in binary files.
Because of this limitation, in order to include a binary file in a text transmission, it must be encoded - all of the bytes of data in the binary file must be represented in a text file by
printable characters.
In the very early days of Bulletin Boards and UUnet (Later Usenet), I saw a
number of schemes tried out to do this. One of the earliest that I recall in the
DOS era (CP/M had by that time gone the way of the Beta VCR - lamented, but
overwhelmed by the more ubiquitous DOS) was to encode a binary file into
ASCII characters with a simple "Substitution cipher" using a DOS utility called
Debug - and transmitting the file as a script or batch job that the debug
program could run to write the binary file back to the disk. It usually
worked - and made it possible to trade all sorts of utilities, pictures, etc.
Mostly, these types of techniques were used to transfer (incredibly crude by
today's standards) pictures of nude models from gentlemens magazines. From the
very beginning, it was transferring Thunderscanned centerfolds, much
more than sharing scientific data or culturally important art or anything
else, that drove the development and capability to transfer binary files
over the cumbersome text systems. While not very tasteful, there it is, the
immense capabilities for sharing great things is largely there because the
engineers and technicians that largely drove the development of the
capabilities wanted to trade nudie pics.
Just for reference, Here's an
example clipped out of a file representative of the quality of binaries
in those early days. From the archive I had it in, it appears to be a UUnet
download from about 1986 or 87. Now, whenever I look at these, I wonder why I thought it was
worth so much effort to get them!
Since that time, increasing
capabilities of the system in general made it possible to transmit MUCH higher
quality images, sounds, even full videos or movies in breathtaking quality.
The UUEncoding standard for file encoding began to be widely used and there
came to be a lot of software to support it, and for many years it was the
only way to reliably trade binary files on Usenet. It is also used by most email
software. There are other acceptable encoding methods, such as base64, mimehex,
and many other similar schemes, but in this article I'll focus on UUE and yEnc,
being by far the most prevalent on Usenet.
Encoding a binary file of any kind results
in a larger file
than the original - because fewer characters are available to represent the arrangement
of bytes in the binary file, combinations of multiple characters must be
used to represent the bytes that have no ASCII character equivalent. The two most
common forms of encoding used today on Usenet are UUEncoding (UUE) and yEncoding
(yEnc). yEnc is a fairly recent development, only having come into widespread use
over the last couple of years. It has an advantage in making smaller encoded
files - as it uses a larger character set for encoding the files, now
that almost all Usenet systems support 8-bit characters rather than the old
standard of 7-bit characters. To download a UUEncoded binary file from Usenet, you
will actually transfer about 1.3 megabytes or more per megabyte of actual binary
file that you wind up with. Downloading a yEncoded binary file, you'll only
need to download about 1.1 megabytes per megabyte of binary file. An additional
improvement over UUE is the addition of a "checksum" to each file, which
tells your downloading and decoding software if the file is valid immediately.
This bandwidth reduction is a huge improvement if you upload or download large amounts
of binaries, and it also reduces the load on the news servers themselves. If you
are interested in a detailed discussion of the yEnc encoding standard,
there is a very good one at http://www.yenc.org/ along with links to programs and source code.
Before we get
into the details of recognizing and downloading binary files from Usenet,
I'm going to add a note -
because it's IMPORTANT: Always, Always, Always - Use a virus scanner with the most current virus definitions
to scan ANY and EVERY file you download from Usenet. Get into the
habit of simply scanning your download directories whenever you finish a session. Viruses can
not be carried by
image (.jpg, .avi, .mpg, .mp3, etc. extension) files, but it's a great habit to get into to scan the download directories before using the files. Often an executable file
(with a .com, .bat, .scr, .pif, .exe, etc. extension) can be downloaded by accident along with a number of image files, and executed by mistake. There are bad actors out there
who craft posts with subject lines that make it appear that the binary file associated with the message is an image file, but it is actually a virus, trojan, adware, or other malware executable.
So -
everyone keeps talking about binary files on usenet,
but all you see when you look are hundreds of posts full of gibberish,
and very similar subject lines. How do you actually get the files? Why
are there so many pieces? It just doesn't seem to make any sense!
First off, if you really
want to deal with binaries in a meaningful way, you must get a binary newsreader. One problem is that large binary files must be transmitted
as several messages, each containing only a part of the binary file. This is because really large messages uploaded as a single text article will almost never "propigate" or be sent from one server to
another on a different system. Usually, a large binary file, once encoded into ASCII text, will be split into several articles, or messages, each from about
1000 to 5000 "lines" of text long. These segments must be combined, as well as decoded, to get the binary file saved to your disk. Binaries oriented newsreaders will do this
for you, allowing you to view the lists of posts as list of files, with the multipart posts combined for you automatically into one subject line. They will also automatically combine
and decode the posts into the files you want, directly into a download directory.
There are several newsreaders
that were made specifically with features that make it easy to work with
binaries. I use "Newsbin" ($35.00, http://www.newsbin.com, 10 Day Free trial
version available) and would recommend it to anyone. A trial version can be
downloaded and used for free for 10 days to try it out. The purchase price also
entitles you to free upgrades for life. Other programs include "Agent" ($29.00,
http://www.forteinc.com, "Free Agent" version available with fewer features),
"BNR" (Freeware, http://www.bnr2.org/) -
There are many, many others. A good website to see reviews and downloads
of the free or trial versions of many is available at (download.com).
One drawback is that if
you like to participate in discussions
in text
groups, and work with binaries as well, it is hard to find one program
that does it all. The binary newsreaders have much less capability for displaying and replying in
threaded discussions, and the really good newsreaders made for that purpose do
not handle binaries well. Personally I use Newsbin for binary groups, and Outlook Express for text groups. That way I have
the features I like for each. Many newsreaders try to do both, and my personal opinion is that the attempt results in both modes
of usage being less than perfect. Others will disagree with me in this, I'm quite sure. Search for discussions and reviews
on newsreaders in any web search engine, and you will see many
different opinions. My best advice is to work at it a bit and find what suits you best after trying out a few.
However,
without a binaries-oriented newsreader to do the work for you, you can still
work with binaries in Outlook Express, Netscape Messenger / mail, etc. - but it
will cost you some extra effort and is not nearly as
quick and easy. Here are some details of what binary posts consist of, so that you can see how it works.
Once a binary file has been encoded into text form for transmission, it can be posted as a message. If it is a small file, it can simply be transmitted as a single message. Here is
an example of a small .gif file ( ) UUencoded and transmitted as a message:
From: technogeek@no.spam.please.tg (technogeek)
Subject: Test UUEnc File For Illustration - Thing.gif (1/1)
Organization: Society for the Promotion of Elfish Welfare
Lines: 14
NNTP-Posting-Host: be4b1ac3.news.usenetmonster.com
Xref: news.usenetmonster.com alt.binaries.test:14698481
begin 644 Thing.gif
M1TE&.#EA%``4`-4``````/___\ZWL;&QK6UM82$A("`@&-C8TI*
M2D)"0CDY.2DI*2$A(1`0$`@("/___P``````````````````````````````
M`````````````````````````````````"'Y!`$``"\`+``````4`!0```;(
MP(!P2"P:C\ACR#
MD9!TB2K2@4.`3]X1$0@3=T-?)!P-;W\2@0@4A4)Y)`$-#XQR#@@!D0%Y!P$,
M@:,,#@F<1"86%B09#PBPL;`!*T:K&@<B5U<B!BU%%A\N`,3%Q2XMM406*AT"
M)QXJ%QH:'BTJ)[8%%2,=!QTA!QD:&[@:V4,6!05?WAOO!P85%0<LP`4LWMX9
6XM_?]D4NL!A(L"#!)`@3"C&1)`@`.P``
`
end
And, an example in yEnc:
From: technogeek@no.spam.please.tg (technogeek)
Subject: Test for Illustration - yEnc - Thing.gif (1/1)
Organization: Society for the Promotion of Elfish Welfare
Lines: 8
NNTP-Posting-Host: be4b1ac3.news.usenetmonster.com
Xref: news.usenetmonster.com alt.binaries.test:14698482
=ybegin part=1 line=128 size=427 name=Thing.gif
=ypart begin=1 end=
427qspbc******)))**[)[*)*[
)[)*)))[)))))))*[[*[ƍ))
=@))!=@[[!!!߮tttl
llcccSSSKKK:::222)))**************************
**********************K#.+**Y*V*******0ꪚrVD
d[/T]Fx=JT8ʶZR?:+PrPT
my;;2=}mNF72lN+7982++1+6683nP@
@NC92+UpD1LL0Wo@IX*XWn@TG,QHTADDHWTQ
/?MG1GK1CDEDm@//E10??1V/VCoXBrJN2
=}4[N2*e
=yend size=427 part=1 pcrc32=59558693
You can see, it just looks like nonsense - you
need to use a program to view the message. In a newsreader like Outlook Express,
a UUEncoded file that is a single message like the above example will
show in your browser as a picture in the body of the message - all
you have to do is right click on it and save it. You can also
simply save the message to a text file and open it with a
program like WinZIP which will extract the binary files from the encoded text. For larger files, however, you
will see many messages for each large binary file, one for each segment, with subject lines like:
File For Illustration - Thing2.jpg (1/5)
File For Illustration - Thing2.jpg (2/5)
File For Illustration - Thing2.jpg (3/5)
File For Illustration - Thing2.jpg (4/5)
File For Illustration - Thing2.jpg (5/5)
Simply selecting one of the messages and saving its
attachment will not work - it will result in a file that only contains the
segment for that message. In order to get the file, if it's UUEncoded, in Outlook
Express you must select all of the messages containing segments
for the file, and right-click on the selected segments and choose "Combine
and Decode".
Unfortunately this does not work for yEncoded files (yEnc), as
Outlook Express does not support yEnc out of the box - you must add a second program to
do the yEnc decoding for you. If you are determined to use Outlook Express, or Netscape Messenger for binaries, with
a bit of work you can add yEnc support with a proxy add-on - see http://www.brawnylads.com/yproxy for detailed instructions and faqs. Here is the basic procedure:
Download the yEnc proxy software from the link at http://www.brawnylads.com/yproxy and
install it. When you get to the installation screen, accept the default settings and agree to the license agreement. In the page of options with shortcuts listed, put a check in the option for "Startup Folder" so that yProxy will restart every time you reboot your machine.
Simply click the "Next" buttons in the following screens to finish the installation.
Once you've installed yProxy, you must configure Outlook Express and yProxy to access your news server differently. Make sure you remember your news server, login, password, etc. information so that you can go back to your previous configuration if you ever want to!
In Outlook Express Click on ">>Tools>>Accounts"
and click on the tab labeled "News". Select the server you want to access
through the yProxy and click on the "Properties" button. Select the
"Server" tab, and in the box
titled "Server Name", replace the server name or IP address with "127.0.0.1", or if
you like "localserver", they will both work just as well. By doing
this you
are now using the yproxy program as a server, and the yProxy program
will use your "real" news server to get the articles. Save your changes and start up
yproxy using the icon in your start menu, desktop, or startup menu - you will see a configuration screen. In the box titled "News Server" in yproxy put in the name of your news server that you replaced in Outlook Express (i.e. news.usenetmonster.com or whatever your setting was),
and enter your name and password if necessary - it usually will be necessary if you use a premium news service. Click on the "Start" button,
and use Outlook Express as you usually would. yEnc posts will now be able to be used just as UUEncoded posts. When your computer is restarted and the yProxy program
comes up, make sure you click on the "Start" button before running Outlook Express - or News will not be available as yProxy now makes all
news requests for you. If you didn't install the yProxy program in the Startup folder, you will need to run it from the start menu or desktop icon and click "Start" before starting Outlook Express, in order to access News.
Finding binary files
There are literally thousands of newsgroups now specializing in trading binary files of various types.
The hierarchy starting with "alt.binaries" is the place to start, simply look for groups with names that describe
the types of files you are looking for. Some groups can be difficult to work with, especially at first, until
you get used to the search and filter capabilities of a good binary newsreader, because they have huge amounts
of traffic - often gigabytes of data, tens of thousands of individual messages, being posted on a daily basis.
alt.binaries.movies and alt.binaries.multimedia are two examples of such high volume groups - trying to make
sense of these with a non-binaries oriented newsreader such as Outlook Express or Netscape Messenger would be
virtually impossible.
When you download the list of newsgroups available from your news server, most newsreaders will supply
a number of available posts for each one - so while you are starting out, you can avoid the groups with
hundreds of thousands of available posts until you get your feet wet. Of course, these are also the groups
"where the action is" so to speak, so once you get the basics of finding and downloading posts figured out,
those will be the newsgroups where you'll find the largest selection of things to download.
A quick note about news servers - they are not all created
equal!
You will quickly find that while there appear to be many files available for download, you often can't seem
to find all the parts necessary to complete the files that you want - or that files seem to disappear quickly,
often before you get the chance to download them. These are two of the most important considerations in choosing
a news provider - Completion, and Retention. If you are serious about working with binaries on Usenet, the News
service that was provided at no extra cost with your internet access probably will not do. Some are OK, others
are simply unusable - but almost none will provide really good completion of messages in the binary groups and
keep all of the message parts for binary files long enough for you to use them. It takes a huge dedication of
resources to carry the binary newsgroups, have good connections to multiple "news feeds" to ensure that all of
the posts sent from various places to a particular newsgroup get collected (Completion), and to have the vast amounts
of disk storage - terabytes, on most good news providers - to be able to keep the posts long enough for the users to be able
to access them (Retention).
This is another area where you will find as many opinions as there are people to hold them. At the bottom end of the range are
most "free" services included with internet access, the systems are added as an afterthought, the amount of storage that can be
allocated to any particular group is too low. For text groups, this is usually fine - the messages don't take up much room, and
can be kept for weeks before deleting old messages to make room for new ones. In high volume binaries groups however,
if there is only a few hundred megabytes of storage allocated to the group, posts may only last hours or even minutes before
being deleted to make room for new posts. This makes it virtually impossible to successfully download anything.
I've selected "UsenetMonster" (about $1 per gigabyte, www.UsenetMonster.com) as a News provider - I've been extremely happy with both the completion and retention
I've been able to retrieve binaries that were posted weeks earlier in even very busy groups. Again, there are many
people that swear by different providers, that's just my Humble Preference.
This article is getting long enough! - so I'll address dealing with various kinds of files, archives, incomplete file
repair schemes and related topics in another article. But I will take this last opportunity to warn anyone getting started -
ALWAYS scan your download directories for viruses! Pay close attention that there are no executable files (with a .com, .bat, .scr, .pif, .exe, etc. extension)
carrying viruses, trojans, ad-ware, or other bad things, mixed in with your pictures or sounds by some sneaky scoundrel!
Have fun, and be careful out there!
--technogeek
|
 |
| |
| High-speed access to thousands of newsgroups
for as low as |
| $2.95 |
per month
Sign
up today! |

|

|
|
|