NAME
    gatherstats - process statistical data from a raw source

SYNOPSIS
    gatherstats [-Vhdt] [-m *YYYY-MM* | *YYYY-MM:YYYY-MM*] [-s *stats*] [-c
    *filename template*]] [--hierarchy *TLH*] [--rawdb *database table*]
    [-groupsdb *database table*] [--hostsdb *database table*] [--clientsdb
    *database table*] [--conffile *filename*]

REQUIREMENTS
    See "README" in doc.

DESCRIPTION
    This script will extract and process statistical information from a
    database table which is fed from feedlog.pl for a given time period and
    write its results to (an)other database table(s). Entries marked with
    *'disregard'* in the database will be ignored; currently, you have to
    set this flag yourself, using your database management tools. You can
    exclude erroneous entries that way (e.g. automatic reposts (think of
    cancels flood and resurrectors); spam; ...).

    The time period to act on defaults to last month; you can assign another
    time period or a single month via the --month option (see below).

    By default gatherstats will process all types of information; you can
    change that using the --stats option and assigning the type of
    information to process.

    Possible information types include:

    groups (postings per group per month)
       gatherstats will examine Newsgroups: headers. Crosspostings will be
       counted for each single group they appear in. Groups not in *TLH*
       will be ignored.

       gatherstats will also add up the number of postings for each
       hierarchy level, but only count each posting once. A posting to
       de.alt.test will be counted for de.alt.test, de.alt.ALL and de.ALL,
       respectively. A crossposting to de.alt.test and de.alt.admin, on the
       other hand, will be counted for de.alt.test and de.alt.admin each,
       but only once for de.alt.ALL and de.ALL.

       Data is written to *DBTableGrps* (see "INSTALL" in doc); you can
       override that default through the --groupsdb option.

    hosts (postings from host per month)
       gatherstats will examine Injection-Info:, X-Trace: and Path: headers
       and try to normalize them. The sum of all detected hosts will also be
       saved for each month. Groups not in *TLH* will be ignored.

       Data is written to *DBTableHosts* (see "INSTALL" in doc); you can
       override that default through the --hostsdb option.

    clients (postings by client per month)
       gatherstats will examine User-Agent:, X-Newsreader: and X-Mailer:
       headers and try to remove comments and non-standard contents. Clients
       and client versions are counted separately. The sum of all detected
       clients will also be saved for each month. Groups not in *TLH* will
       be ignored.

       Data is written to *DBTableClnts* (see "INSTALL" in doc); you can
       override that default through the --clientsdb option.

  Configuration
    gatherstats will read its configuration from newsstats.conf which should
    be present in etc/ via Config::Auto or from a configuration file
    submitted by the --conffile option.

    See "INSTALL" in doc for an overview of possible configuration options.

    You can override configuration options by using the --hierarchy,
    --rawdb, --groupsdb, --clientsdb and --hostsdb options, respectively.

OPTIONS
    -V, --version
       Display version and copyright information and exit.

    -h, --help
       Display this man page and exit.

    -d, --debug
       Print debugging information to STDOUT while processing (number of
       postings per group).

    -t, --test
       Do not write results to database. You should use --debug in
       conjunction with --test ... everything else seems a bit pointless.

    -m, --month *YYYY-MM[:YYYY-MM]*
       Set processing period to a single month in YYYY-MM format or to a
       time period between two month in YYYY-MM:YYYY-MM format (two month,
       separated by a colon). Defaults to last month.

    -s, --stats *type*
       Set processing type to one of *all*, *groups*, *hosts* or *clients*.
       Defaults to *all*.

    -c, --checkgroups *filename template*
       Relevant only for newsgroup stats (*groups*).

       Check each group against a list of valid newsgroups read from a file,
       one group on each line and ignoring everything after the first
       whitespace (so you can use a file in checkgroups format or (part of)
       your INN active file).

       The filename is taken from *filename template*, amended by each
       --month gatherstats is processing in the form of *template-YYYY-MM*,
       so that

           gatherstats -m 2010-01:2010-12 -c checkgroups

       will check against checkgroups-2010-01 for January 2010, against
       checkgroups-2010-02 for February 2010 and so on.

       Newsgroups not found in the checkgroups file will be dropped (and
       logged to STDERR), and newsgroups found there but having no postings
       will be added with a count of 0 (and logged to STDERR).

    --hierarchy *TLH* (newsgroup hierarchy/hierarchies)
       Override *TLH* from newsstats.conf.

       *TLH* can be a single word or a comma-separated list.

    --rawdb *table* (raw data table)
       Override *DBTableRaw* from newsstats.conf.

    --groupsdb *table* (postings per group table)
       Override *DBTableGrps* from newsstats.conf.

    --hostsdb *table* (host data table)
       Override *DBTableHosts* from newsstats.conf.

    --clientsdb *table* (client data table)
       Override *DBTableClnts* from newsstats.conf.

    --conffile *filename*
       Read configuration from *filename* instead of newsstats.conf.

INSTALLATION
    See "INSTALL" in doc.

EXAMPLES
    Process all types of information for lasth month:

        gatherstats

    Do a dry run, showing results of processing:

        gatherstats --debug --test

    Process all types of information for January of 2010:

        gatherstats --month 2010-01

    Process only number of postings for the year of 2010, checking against
    checkgroups-*:

        gatherstats -m 2010-01:2010-12 -s groups -c checkgroups

FILES
    bin/gatherstats.pl
        The script itself.

    lib/NewsStats.pm
        Library functions for the NewsStats package.

    etc/newsstats.conf
        Runtime configuration file.

BUGS
    Please report any bugs or feature requests to the author or use the bug
    tracker at <https://code.virtcomm.de/thh/newsstats/issues>!

SEE ALSO
    - "README" in doc

    - "INSTALL" in doc

    This script is part of the NewsStats package.

AUTHOR
    Thomas Hochstein <thh@thh.name>

COPYRIGHT AND LICENSE
    Copyright (c) 2010-2013, 2025 Thomas Hochstein <thh@thh.name>

    This program is free software; you may redistribute it and/or modify it
    under the same terms as Perl itself.

