User talk:MER-C/Wiki.java

{{notice|This page is kept for historical reasons. Urgent stuff goes to User talk:MER-C, while everything else goes to [http://code.google.com/p/wiki-java/issues/list the bug tracker]. MER-C 03:32, 10 September 2012 (UTC)}}

Changelog

class="wikitable"
Version

! Diff

! Comment

0.01

| [http://en.wikipedia.org/w/index.php?title=User:MER-C/Wiki.java&oldid=118481626 diff]

| Initial.

0.02

| [http://en.wikipedia.org/w/index.php?title=User%3AMER-C%2FWiki.java&diff=119466334&oldid=118481626 diff]

| Add default constructor, getCategoryMembers(String name).

0.03

| [http://en.wikipedia.org/w/index.php?title=User:MER-C/Wiki.java&diff=next&oldid=119466334 diff]

| Added namespace support.

0.04

| [http://en.wikipedia.org/w/index.php?title=User:MER-C/Wiki.java&diff=next&oldid=137235312 diff]

| Moved to use the [http://en.wikipedia.org/w/api.php mediawiki api]. Added category intersection. License -> GPL 3.

0.05

| [http://en.wikipedia.org/w/index.php?title=User:MER-C/Wiki.java&diff=next&oldid=144563331 diff]

| Added logging, sketchy user support. Worked around silly api limitation of 500/5000 elements returned per query.

0.06

| [http://en.wikipedia.org/w/index.php?title=User:MER-C/Wiki.java&diff=next&oldid=161254138 diff]

| Fields for the various mediawiki logs. Added spamsearch, getDomain() (should have done this earlier).

0.07

| [http://en.wikipedia.org/w/index.php?title=User:MER-C/Wiki.java&diff=next&oldid=162427884 diff]

| Optimized for bandwidth, add userRights() caching. Debug.

0.08

| [http://en.wikipedia.org/w/index.php?title=User:MER-C/Wiki.java&diff=next&oldid=164915124 diff]

| Log support. Added a few utility methods.

0.09

| [http://en.wikipedia.org/w/index.php?title=User%3AMER-C%2FWiki.java&diff=169833050&oldid=167389025 diff]

| Add listPages(), editing throttle, better cookies. Now uses GZIP compression. Various other fixes.

0.10

| [http://en.wikipedia.org/w/index.php?title=User%3AMER-C%2FWiki.java&diff=173067521&oldid=170515686 diff]

| Add persistence, getImage(), whatLinksHere(), imageUsage(), getCurrentDatabaseLag(), getRenderedText(), getTalkPage(), getProtectionLevel(), pageExists(). We now check whether a page is protected before editing it. Various fixes, including ones below.

0.11

| [http://en.wikipedia.org/w/index.php?title=User%3AMER-C%2FWiki.java&diff=178668370&oldid=173067521 diff]

| Add upload(), parseList(), hasNewMessages(), assertions, maxlag. Rewrite login(), intersection().

0.12

| [http://en.wikipedia.org/w/index.php?title=User%3AMER-C%2FWiki.java&diff=185840755&oldid=178668370 diff]

| Add ip block list, transclusions. Exception overhaul. Various optimizations.

0.13

| [http://en.wikipedia.org/w/index.php?title=User%3AMER-C%2FWiki.java&diff=189924502&oldid=185840755 diff]

| Add random page, thumbnails, ability to parse arbitrary wikitext.

0.14

| [http://en.wikipedia.org/w/index.php?title=User%3AMER-C%2FWiki.java&diff=208313866&oldid=189924502 diff]

| Added arbitrary scriptpath support, search, statistics, some other stuff.

0.15

| [http://en.wikipedia.org/w/index.php?title=User%3AMER-C%2FWiki.java&diff=217906441&oldid=208313866 diff]

| Short/long pages, [http://en.wikipedia.org/w/index.php?oldid=217725026#Wiki.java bug fixes].

0.16

| [http://en.wikipedia.org/w/index.php?title=User%3AMER-C%2FWiki.java&diff=233514801&oldid=219460129 diff]

| Added API edit, move, edit counter, various "stuff about this page" methods.

0.17

| [http://en.wikipedia.org/w/index.php?title=User%3AMER-C%2FWiki.java&diff=244331540&oldid=233514801 diff]

| rm screen-scrape edit; add contribs(), Revision, section editing, purge()

0.18

| [http://en.wikipedia.org/w/index.php?title=User%3AMER-C%2FWiki.java&diff=269087073&oldid=249807055 diff]

| More "stuff about this page" methods, condense, status check, better error handling

0.19

| [http://en.wikipedia.org/w/index.php?title=User%3AMER-C%2FWiki.java&diff=276948335&oldid=269087073 diff]

| Rollback, page history, bug fixes

0.20

| [http://en.wikipedia.org/w/index.php?title=User%3AMER-C%2FWiki.java&diff=302923289&oldid=276948335 diff]

| Image history, old images, undo, new pages, revdelete bug fixes

0.21

| [http://en.wikipedia.org/w/index.php?diff=343496389 diff]

| diff, attempt at upload API, various bug fixes

0.22

| [http://en.wikipedia.org/w/index.php?diff=344760882 diff]

| quick user agent fix

Special page equivalents

See Special:Specialpages for a list of special pages. The text on special pages may be edited by editing the appropriate system message.

class=wikitable
Special page

! Equivalent code

Special:Allmessages

| listPages("MediaWiki:", Wiki.FULL_PROTECTION, Wiki.ALL_NAMESPACES)

Special:Allpages

| listPages()

Special:Contributions

| contribs() (excludes Special:Contributions/newbies)

Special:Ipblocklist

| getIPBlockList()

Special:Linksearch

| spamsearch()

Special:Listusers

| allUsers()

Special:Log

| getLogEntries()

Special:Longpages

| longPages()

Special:Movepage

| move()

Special:Mypage

| String title = "User:" + wiki.getCurrentUser().getUsername();

Special:Mytalk

| String title = "User talk:" + wiki.getCurrentUser().getUsername();

Special:Newimages

| getLogEntries(int amount, Wiki.UPLOAD_LOG) or newPages(int amount, Wiki.IMAGE_NAMESPACE)

Special:Newpages

| newPages()

Special:Prefixindex

| listPages()

Special:Protectedpages

| listPages()

Special:Random

| random()

Special:Search

| search()

Special:Shortpages

| shortPages()

Special:Statistics

| getSiteStatistics()

Special:Upload

| upload()

Special:Userlogin

| login()

Special:Userlogout

| logoutServerSide()

Special:Whatlinkshere

| whatLinksHere()

Two Errors

Hello,

There are two little Errors in your Code:

First:

In the method "getPageText(String title)" the row

text.append(line);

should be

text.append(line + "\n");

second:

the method "login" doesn't work at the german Wikipedia, the Bot log in correctly, but the Function returns false, because in the German Login-page the text "Login successful" doesn't exist.

--88.72.43.131 11:05, 14 November 2007 (UTC)

I hope you can understand me. I know, my english isn't very good ;)

:Fixed both, but it would be some time before they are live - the todo list for 0.10 is quite long. (The fix for the second one is to replace "Login successful" with "wgUserName = \"" + username + "\"", if you can't wait). MER-C 05:50, 16 November 2007 (UTC)

getPageText() can use API

public String getPageText(String title) throws IOException

{

// pitfall check

if (namespace(title) < 0)

throw new UnsupportedOperationException("Cannot retrieve Special: or Media: pages!");

// go for it

String URL = query + "prop=revisions&rvprop=content&titles="+URLEncoder.encode(title, "UTF-8");

logurl(URL, "getPageText");

checkLag("getPageText");

URLConnection connection = new URL(URL).openConnection();

setCookies(connection, cookies);

connection.connect();

BufferedReader in = new BufferedReader(new InputStreamReader(new GZIPInputStream(connection.getInputStream()), "UTF-8"));

String result = "";

String content = "";

// get the text

String line = "";

while ((line = in.readLine()) != null)

result += line+"\n";

if (result.indexOf("missing=\"\"") != -1)

content = "(not yet written)";

else if (result.indexOf("invalid=\"\"") != -1)

content = "(Bad title)";

else if (result.indexOf("") != -1)

content = "(empty)";

else

content = result.substring(result.indexOf("")+5,result.indexOf(""));

in.close();

log(Level.INFO, "Successfully retrieved text of " + title, "getPageText");

return decode(content);

}

— Preceding unsigned comment added by 80.143.120.164 (talkcontribs)

Sorry about the wait - I only check this page when I release a new version. The current way avoids parsing any XML. Sometimes it's harder and slower to use the API - rollback is another example. {{wontfix}}. (I did, however, tweak the docs to detail what happens when exists(title)[0] == false). MER-C 10:56, 22 August 2008 (UTC)

:I'm having second thoughts about WONTFIXing this, the API's resolve redirects functionality could be handy here. MER-C 12:55, 22 August 2008 (UTC)

using rights and not groups for "apihighlimits"

Use rights to chance highlimit, not group ('BOT' or 'ADMIN' are groups see '[http://en.wikipedia.org/w/api.php?action=query&list=users&ususers=MER-C&usprop=groups query("meta=userinfo&uiprop=rights|groups")]', but you call it right ('User.userRights()')

int limit = 500;

String result = query("meta=userinfo&uiprop=rights")

if (result.indexOf("apihighlimits") != -1)

limit = 5000; //500 per default

:This adds a query for no real reason because the result of User.userRights() is cached. (Just tweak the source if the default doesn't apply to you.) The method is named after Special:Userrights before I realized they were groups. Implementing the whole permissions model would result in lots of public static final long (ints aren't good enough) spam and take 500+ lines. {{Later}}. MER-C 14:01, 23 August 2008 (UTC)

Upload bug?

Hi, I'm trying your code (great BTW) to upload files. There seems to be a problem with "special" chars in the destination filename and the description (see for example http://commons.wikimedia.org/wiki/File:Test%2Bkgoiyfyktgkggukgku.jpg):

  • Spaces in the dest filename will turn into "+"
  • Upload will say "Successfully uploaded" but fail when the dest filename contains a German Umlaut (äöüÄÜÖ)
  • Upload will say "Successfully uploaded" but fail when the dest filename contains a comma (,)
  • If upload succeeds, special characters in the wikitext will turn into gibberish

I tried to add "Content-Type:text/plain; charset=utf-8;" to the upload description and/or the wpDestFile (both with and without the content-type), but no luck. Do you know a quick fix? Cheers, --Magnus Manske (talk) 23:34, 7 August 2009 (UTC)

:Update: I've managed to clean up the contents by encoding it as iso-8859-1:

try {

contents = new String(contents.getBytes("UTF-8"), "iso-8859-1");

} catch (UnsupportedEncodingException ex) {

Logger.getLogger(BArchangleView.class.getName()).log(Level.SEVERE, null, ex);

}

No luck with the dest filename yet, though. I suppose the entire request should rather be utf-8 instead of these ugly hacks... --Magnus Manske (talk) 13:18, 8 August 2009 (UTC)

::Update 2: Got it working now! Here's the code of the entire function:

public synchronized void upload(File file, String filename, String contents) throws IOException, LoginException

{

// TODO: API upload? Still in the pipeline, unfortunately.

// throttle

long start = System.currentTimeMillis();

statusCheck();

// check for log in

if (user == null)

{

CredentialNotFoundException ex = new CredentialNotFoundException("Permission denied: you need to be registered to upload files.");

logger.logp(Level.SEVERE, "Wiki", "upload()", "[" + domain + "] Cannot upload - permission denied.", ex);

throw ex;

}

// UTF-8 vodoo

try {

contents = new String(contents.getBytes("UTF-8"), "iso-8859-1");

} catch (UnsupportedEncodingException ex) {

Logger.getLogger(BArchangleView.class.getName()).log(Level.SEVERE, null, ex);

}

// check if the page is protected, and if we can upload (incorporates lag check)

String filename2 = filename.replaceAll(" ", "_");

// String filename2 = URLEncoder.encode(filename.replaceAll(" ", "_"), "UTF-8");

try {

filename2 = new String(filename2.getBytes("UTF-8"), "iso-8859-1");

} catch (UnsupportedEncodingException ex) {

Logger.getLogger(BArchangleView.class.getName()).log(Level.SEVERE, null, ex);

}

String fname = "File:" + filename2;

if (!checkRights(getProtectionLevel(fname), false))

{

CredentialException ex = new CredentialException("Permission denied: image is protected.");

logger.logp(Level.WARNING, "Wiki", "upload()", "[" + domain + "] Cannot upload - permission denied.", ex);

throw ex;

}

// prepare MIME type

String extension = filename2.substring(filename2.length() - 3).toUpperCase().toLowerCase();

if (extension.equals("jpg"))

extension = "jpeg";

else if (extension.equals("svg"))

extension += "+xml";

// upload the image

// this is how we do multipart post requests, by the way

// see also: http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.2

String url = base + "Special:Upload";

logurl(url, "upload");

URLConnection connection = new URL(url).openConnection();

String boundary = "----------NEXT PART----------";

connection.setRequestProperty("Accept-Charset", "iso-8859-1,*,utf-8");

connection.setRequestProperty("Content-Type", "multipart/form-data; boundary=" + boundary);

setCookies(connection, cookies);

connection.setDoOutput(true);

connection.connect();

// send data

boundary = "--" + boundary + "\r\n";

DataOutputStream out = new DataOutputStream(connection.getOutputStream());

// DataOutputStream out = new DataOutputStream(System.out); // debug version

out.writeBytes(boundary);

out.writeBytes("Content-Disposition: form-data; name=\"wpIgnoreWarning\"\r\n\r\n");

out.writeBytes("true\r\n");

out.writeBytes(boundary);

out.writeBytes("Content-Disposition: form-data; name=\"wpDestFile\"\r\n");

out.writeBytes("Content-Type: text/plain; charset=utf-8\r\n\r\n");

out.writeBytes(filename2);

out.writeBytes("\r\n");

out.writeBytes(boundary);

out.writeBytes("Content-Disposition: form-data; name=\"wpUploadFile\"; filename=\"");

out.writeBytes(filename);

out.writeBytes("\"\r\n");

out.writeBytes("Content-Type: image/");

out.writeBytes(extension);

out.writeBytes("\r\n\r\n");

// write image

FileInputStream fi = new FileInputStream(file);

byte[] b = new byte[fi.available()];

fi.read(b);

out.write(b);

fi.close();

// write the rest

out.writeBytes("\r\n");

out.writeBytes(boundary);

out.writeBytes("Content-Disposition: form-data; name=\"wpUploadDescription\"\r\n");

out.writeBytes("Content-Type: text/plain\r\n\r\n");

out.writeBytes(contents);

out.writeBytes("\r\n");

out.writeBytes(boundary);

out.writeBytes("Content-Disposition: form-data; name=\"wpUpload\"\r\n\r\n");

out.writeBytes("Upload file\r\n");

out.writeBytes(boundary.substring(0, boundary.length() - 2) + "--\r\n");

out.close();

// done

BufferedReader in;

try

{

// it's somewhat strange that the edit only sticks when you start reading the response...

String line ;

// in = new BufferedReader(new InputStreamReader(new GZIPInputStream(connection.getInputStream()), "UTF-8"));

in = new BufferedReader(new InputStreamReader(connection.getInputStream()));

line = in.readLine();

// while ((line = in.readLine()) != null) System.out.println(line);

in.close();

}

catch (IOException e)

{

// retry once

if (retry)

{

retry = false;

log(Level.WARNING, "Exception: " + e.getMessage() + " Retrying...", "upload");

upload(file, filename, contents);

}

else

{

logger.logp(Level.SEVERE, "Wiki", "upload()", "[" + domain + "] EXCEPTION: ", e);

throw e;

}

}

if (retry)

log(Level.INFO, "Successfully uploaded " + filename, "upload");

retry = true;

// throttle

try

{

long z = throttle - System.currentTimeMillis() + start;

if (z > 0)

Thread.sleep(z);

}

catch (InterruptedException e)

{

// nobody cares

}

}

I still think the iso-hack is ugly, though... --Magnus Manske (talk) 16:07, 8 August 2009 (UTC)

:Yeah. I need to rewrite it for the upload API anyway, which will be with us on the next scap (Wikimania, perhaps?). Hopefully things will be saner then. MER-C 06:51, 9 August 2009 (UTC)

Bug in move()?

// success

if (temp.contains("move from"))

in.close();

// failure

checkErrors(temp, "move");

Should be:

// success

if (temp.contains("move from"))

in.close();

else

// failure

checkErrors(temp, "move");

? --Nat3738 (talk) 03:16, 8 October 2009 (UTC)

Issue with the APIs returning blank lines before actual response

This may occur in several places, I found the problem in login and edit.

These are the changes I made to make it work

in login:

String line = in.readLine();

boolean success = line.contains("result=\"Success\"");

in.close();

becomes

String line;

boolean success = false;

while ((line = in.readLine()) != null){

if (line.contains("result=\"Success\"")) {

success = true;

break;

}

}

in.close();

in edit the call to checkErrors causes an Exception if the first returned line is blank even though subsequent lines exist with the success message; you need to loop through the returned lines to check for success.

Glen.mccormick (talk) 13:56, 12 January 2010 (UTC)

:{{worksforme}} at least on WMF sites. You're probably thinking of the XML pretty-print format. MER-C 05:35, 12 February 2010 (UTC)

Small corrections

Hello MER-C,

I took the liberty to make 2 modifications on your code:

  • I corrected a bug when getCategories() is called on non existing page or page without category
  • I corrected some javadoc
  • I corrected a bug when getImagesOnPage() is called on non existing page or page without images

But I did not modify the changelog.

I hope you don't mind.

In all cases, thanks a lot for your library and have a happy new year.

Best regards, Liné1 (talk) 07:46, 2 January 2011 (UTC)

:Thanks for the bug fixes. MER-C 09:33, 14 February 2011 (UTC)

== Android compatible ==

I'm using your code for some android apps i'm writing ATM. I had to change some things as android java is missing some functions native java has i.e. isEmpty on Strings had to be replaced with equals("").

So i don't have to maintain the whole thing on my own ... is there any chance i could maintain android compatibility in your repo?

My mail is at Freakolowsky. 10x. —Preceding undated comment added 15:13, 23 May 2011 (UTC).

:[http://developer.android.com/reference/java/lang/String.html#isEmpty%28%29 Stop supporting stale versions of Android, then]. MER-C 03:38, 10 September 2012 (UTC)

checkRights() bug

Hi, im developing Commons:VicuñaUploader and I found bug related with cookies. If someone will log in not using uppercase in first letter (eg. "myaccount"), method user.getUsername() will return "myaccount", but cookies contatins "Myaccount" received from server. As a result CredentialExpiredException will be returned, but it should't. The same situation with spaces and underscores: server will return plus instead.

Fix below:

protected boolean checkRights(int level, boolean move) throws IOException, CredentialException

{

// check if we are logged out

String s = user.getUsername();

s = s.substring(0,1).toUpperCase() + s.substring(1); //first to upper

s = s.replace(" ", "+").replace("_", "+"); //spc to plus

if (!cookies.containsValue(s))

{

logger.log(Level.SEVERE, "Cookies have expired");

logout();

throw new CredentialExpiredException("Cookies have expired.");

}

//(...)

Cheers, Yarl 14:00, 8 September 2012 (UTC)

:[http://code.google.com/p/wiki-java/issues/detail?id=18 Noted]. MER-C 03:35, 10 September 2012 (UTC)

::OK, and is there an easy way to check upload progress? Yarl 12:59, 10 September 2012 (UTC)

:::The MW API is blocking serverside, so you will need to edit upload to update whatever progress bar you have. It is not possible to monitor single chunk uploads. MER-C 08:01, 17 September 2012 (UTC)

::::Might be fixed in r89 (not tested). MER-C 08:27, 17 September 2012 (UTC)