User:Cpiral/relink.pl

This is listed at {{slink|Wikipedia:Tools/Editing tools|Relink}} (starting 17 Dec 2015) after being used dozens of times to cleanup redlinks posted at :category:wikipedia red link cleanup.

Purpose

Purposes:

Usage

Given some wikitext it can list all the links.

This list becomes your links-configuration file.

You edit it to remove links.

To add links, you type up a list the links you want to add

and make that your links-configuration file.

Then you rerun the script against the wikitext

to produce the desired linkage for that wikitext.

See the output of relink -h for usage and instructions.

You'll need perl 5 and its getopts module from CPAN.

Use redirection or piping to specify

< input and > output source files.

You name your own input, output, and configuration files.

Use command-line options

  • -l source_filename to list, or to create your links configuration file.
  • -k links_configfile to keep
  • -r links_configfile to remove
  • -a links_configfile to add

to keep, remove, or add the links in your links configuration filename.

So to modify the way a file is linked, you can

  • add links from a list you wrote (a links-configuration file).
  • remove links listed in an auto-generated links-configuration file you edited.
  • keep links listed in an auto-generated links-configuration file you edited.

Save the output of relink -l to generate the links configfile.

All the links are listed, and they're in the order they were found.

Then, while viewing both the rendered page and the links configuration file,

you use the rendered page to decide where to jump to in the configuration file

to do the removal of links. The editing is only the removal of one or more lines.

What remains may be what's kept or whats removed from the linkage.

For example to cleanup redlinks,

first gague which is greater, the redlinks or the blue links.

If most of the links are blue, remove redlinks and use relink -k.

If most of the links are red, remove blue links and use relink -r.

Examples

What is outside the link does not count for uniqueness.

$ cat wikitext

link label 3linked 4labelling

$ relink -c wikitext

2 link

2 link|label

4 total wikilinks

$ relink -l wikitext

link

link|label

2 unique wikilinks

= Remove or keep =

$ cat wikitext

title label title3ed 4labelling

$ relink -l wikitext > links

4 unique wikilinks

$ cat links

title

title|label

title3

title4|label

Editing the file we called links here, and removing two lines...

$ cat links

title3

title4|label

Here's two opposite uses of the remaining two lines, for the sake of example.

$ relink -r links < wikitext

title label title3ed 4labelling

2 links removed

$ relink -k links < wikitext

title label title3ed 4labelling

2 links removed

To save output, use redirection

$ relink -r links < wikitext > processed_file

You can use the processed file to act as new wikitext to do more linkage configuration

before uploading the final processed_file to the edit box.

= Add =

$ cat wikitext

title label label label title title

$ cat promote

title

title | label

$ relink -a promote < wikitext

title label label label title title

2 links added.

$ relink -ma promote < wikitext

title label label label title title

4 links added.

Source

  1. !/usr/bin/perl
  2. Cpiral at gmail, User:Cpiral
  3. !/usr/bin/perl

use Getopt::Std; getopts 'l:u:r:k:a:c:hm';

use English;

$LIST_SEPARATOR = "";

=pod

Development/testing imperitives:

+ output deleted titles for talk page report (else info lost)

+ use strict compliance to lexify global variables

=cut

$ignore = qr/category|image|file|media/i;

BEGIN {

$USAGE = '

Process your link "label" structures.

source_file: original wikitext (You must download it.)

link_configfile: list of labels. You name and create it.

processed_file: final wikitext (You can reprocess it.)

To remove links:

1) relink -l source_file > link_configfile

The -l option automatically creates a linkage snapshot.

You can manually create your own instead of this step.

2) Edit link_configfile.

Change the snapshot into a new, wanted configuration.

You only delete lines. (See next for which ones.)

3)

a) relink -r link_configfile < source_file > processed_file

The -r option removes the labels from their linkage-markup.

In this case the list of labels are unwanted, e.g. redlinks.

OR

b) relink -k link_configfile < source_file > processed_file

The -k will keep _only_ the list of "keeper" labels.

The processed_file will have all _other_ links removed.

(Relink ignores the Category, Image, Media, or File namespace.)

In this case the list of labels are a new snapshot of linkage.

Note that processed_file is a source_file, and can be reprocessed.

You preview by leaving off the output-redirection: > processed_file.

To add a set of missing links to a list of pages, for each page:

relink -a link_configfile < source_file > processed_file

Hand create your own link_configfile

Synopsis of relink:

relink -l source_file

relink { -r | -k | -[m]a } link_configfile

relink [-c] source_file

-l outputs the labels of all links in the source_file

-r removes linkage from all given labels in link_configfile

-k keeps only links given in the link_configfile, removes others

-a adds links given in the link_configfile, ignores others

-ma (multiple adds) links every occurance

-c outputs the count of links in the wikitext

';

}

if ( $opt_h ) {

print $USAGE;

exit;

}

  1. Input the MediaWiki page source

if ($opt_l or $opt_c){

$either = $opt_l ? $opt_l : $opt_c;

open (SOURCE, "<", $either ) or die "Cannot read $either: $!";

while ( ) # wikitext

{

if ( m/\[\[/ ) { # if wikitext may have a link

# then get all links on that line

# ?! matches by look-ahead

# .*? matches ASAP, and (.*?) is captured as $1

while (m/\[\[(?!$ignore)(.*?)\]\]/g) {

push @links, "$1\n"; # entire|insides

}

}

}

foreach $link (@links) {

$seen{$link}++;

} # needs some kind of order

$count_unique = $count_total = 0;

foreach $link (@links) {

if ( $opt_c ) {

print "$seen{$link} " if $seen{$link};

$count_total += $seen{$link};

}

if ($seen{$link}) {

print "$link";

$count_unique ++;

delete $seen{$link};

}

}

  1. close SOURCE;

print STDERR "$count_unique unique wikilinks \n" if $opt_l;

print STDERR "$count_total total wikilinks \n" if $opt_c;

}

if ($opt_a) {

$count = 0;

open (LINK_CONFIGFILE, "<", $opt_a ) or die "Cannot read $opt_a: $!";

@add = ;

chomp (@add);

foreach ( @add ) {

if ( /[|]/ ) {

# e.g. wikt:neutralize | neutralize

($title,$label) = split /\s*\|\s*/; # configfile ignores spacing

$label =~ s/\s+$//; # no hidden whitespace

$title =~ s/^\s+//; # no leading whitespace

$links{$label} = "$label";

} else { # title needs no label

s/^\s+//; # no leading whitespace

s/\s+$//; # no trailing whitespace

$links{$_} = "$_";

}

}

while ( <> ) # reading links_configfile

{

foreach $phrase ( keys %links ) { # title or title|label

if ( not $opt_m ) { # feature: link nth occurance

if ( m/$phrase(?! *(\||\]\]))/ ) { # looking ahead, no | or ]]

# next regexp says "followed by neither ]] nor |"

s/$phrase(?! *(\||\]\]))/$links{$phrase}/;

delete $links{$phrase}; # link first occurance

$count++;

}

}

else { # link every occurance

if ( m/$phrase(?! *(\||\]\]))/ ) { #

$count++ while m/$phrase(?! *(\||\]\]))/g; # count matches

s/$phrase(?! *(\||\]\]))/$links{$phrase}/g; # replace matches

}

}

}

print;

}

print STDERR "$count links added.\n";

}

if ($opt_r) {

open (LINK_CONFIGFILE, "<", $opt_r ) or die "Cannot read $opt_r: $!";

@remove = ;

chomp @remove;

$count = 0;

while ( <> ) {

if ( m/\[\[/ ) {

foreach $link ( @remove ) {

# autogenerated configuration file line format: title | label

$replacement = ($link =~ s/.*\|//r); # replacement is label

$count++ if s/\Q$link\E/$replacement/; # replace link

}

}

print STDOUT;

}

print STDERR "$count links removed\n";

}

if ($opt_k) {

@source = <>;

open (LINK_CONFIGFILE, "<", $opt_k ) or die "Cannot read $opt_k: $!";

@keep = ;

chomp @keep;

foreach (@source) {

if ( m/\[\[/ ) {

while (m/\[\[(?!$ignore)(.*?)\]\]/g) { # ".*?" matches ASAP

push @oldlinks, $1;

}

}

}

@diff{@oldlinks} = @oldlinks;

delete @diff{@keep};

@remove = keys %diff;

foreach ( @source ) {

$source = $_;

if ( m/\[\[/ ) {

foreach $link (@remove) {

# structure: label

$replacement = ($link =~ s/.*\|//r); # replacement is label

$count++ if

$source =~ s/\Q$link\E/$replacement/; # replace link

}

}

print STDOUT $source;

}

print STDERR $count ? $count : 0, " links removed\n";

}

See also