User talk:The Transhumanist/StripSearchInWikicode.js

: This script is functional, but when cut and pasted into WikEd, the results are double spaced. If you can figure out how the script can remove the extra linefeeds, please let me know. Thank you.

StripSearchInWikicode.js: strips search results down to bare pagenames and adds bullet list wikicode formatting for easy copying and pasting into articles. For Vector skin only.

= Script's workshop =

: This is the work area for developing the script and its documentation. The talk page portion of this page starts at #Discussions, below.

Description / instruction manual

StripSearchInWikicode.js: strips search results down to bare pagenames and adds bullet list wikicode formatting for easy copying and pasting into articles. It also removes redirected entries, and is especially useful for "intitle:" searches. For Vector skin only.

This reduces the search results to a list of links. It strips out the data between the page names, including that annoying "from redirect" note. It adds * to each entry so they look like this:

* [[John Wayne]]

* [[Clint Eastwood]]

* [[Brad Pitt]]

* [[Dwayne Johnson]]

* [[Tom Cruise]]

This makes it easier to copy and paste the links from search results into articles.

Once installed, the script automatically processes your Wikipedia search results.

To install, add this line to your vector.js page:

If you want the detail back in your search results, remove that line, or comment it out by placing two forward slashes (//) at the beginning of it.

{{User:The Transhumanist/Workshop boilerplate/Explanatory notes}}

= General approach =

The script uses the jQuery method .hide() for stripping the elements by class name. Here's an example of stripping out elements with the class name "searchalttitle":

$( ".searchalttitle" ).hide();

Learn about methods at https://www.w3schools.com/js/js_object_methods.asp

Learn about .hide at http://api.jquery.com/hide/

{{User:The Transhumanist/Workshop boilerplate/Bodyguard function}}

{{User:The Transhumanist/Workshop boilerplate/The ready event listener-handler}}

= Activation filters =

I didn't know what else to call these. I wanted the program to only work when intended, and only on intended pages (search result pages). So, I applied the [https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Building_blocks/conditionals conditional, if].

I use the Vector skin, and haven't tested the script on any other skin, so the script basically says "if the vector skin is in use, do what's between the curly brackets". (Which includes the entire rest of the program).

// Only activate on Vector skin

if ( mw.config.get( 'skin' ) === 'vector' ) {

// Run this script only if " - Search results - Wikipedia" is in the page title

if (document.title.indexOf(" - Search results - Wikipedia") != -1) {

= Prep work =

There is no prep work in this script. This would be the declaration of global variables and so on.

= Core program =

This is the part that controls the main flow of the script (decides what to do under what circumstances):

if ( mw.config.get( 'skin' ) === 'vector' ) {

$( function() {

// hide elements by class per http://api.jquery.com/hide

$( ".searchalttitle" ).hide();

$( ".searchresult" ).hide();

$( ".mw-search-result-data" ).hide();

} );

}

So, what this does is 4 things:

First, it checks if the Vector skin is being used and runs the rest of the script only if it is.

Then it applies the jQuery method .hide on all elements labeled as any of these 3 classes: searchalttitle, searchresult, or mw-search-result-data.

To use an object method, you append it to the end of an element, as is done with .hide() 3 times above. Don't forget the parentheses, and be sure to end your statements with a semicolon.

Learn more about .hide at http://api.jquery.com/hide/

== mw.config.get ( 'skin' ) ==

This looks up the value for skin (the internal name of the currently used skin) saved in MediaWiki's configuration file.

[https://www.w3schools.com/jquery/ajax_get.asp jQuery get() Method]
[https://www.mediawiki.org/wiki/Manual:Interface/JavaScript#mw.config mw.config]

== logical operators ==

"===" means "equal value and equal type"

[https://www.w3schools.com/js/js_comparisons.asp JavaScript Comparison and Logical Operators]

== Strip out the sister project results ==

// hide elements of Results from sister projects (per http://api.jquery.com/hide)

$( ".iw-headline" ).hide();

$( ".iw-results" ).hide();

$( ".iw-resultset" ).hide();

$( ".iw-result__title" ).hide();

$( ".iw-result__content" ).hide();

$( ".iw-result__footer" ).hide();

I went through the pagesource looking for the classes of the data displayed in the right-hand column, and inserted them into the code above. (I assume "iw" stands for "interwiki").

== Add wiki formatting to the list items ==

Change log

2017-09-29
Started script from a copy of User:The Transhumanist/StripSearch.js
2017-09-30
2017-10-01

Task list

= Bug reports =

= Desired/completed features =

: Completed features are marked with {{done}}

Development notes

= Adding the wikicode =

: Evad37 nailed it in discussion below

The elements that I wish to change have the class mw-search-result-heading.

Each one has an anchor element within it. Perhaps those can be sandwiched with the desired wikicode (between the double square brackets).

= removing the redirected entries =

: Evad37 nailed it in discussion below

Maybe using .splice could work, if regex could be applied somehow.

for (var i = 0; i < x.length; i++) {

// if current array item matches "searchalttitle"

// remove it from array

// x.splice(i)

// i = i--

}

In the loop above, splicing (removing) the current item would shift the next item into its position. When the loop iterates to the next item, it will have inadvertently skipped one. After splicing, you'd have to decrement i by one.

Or use forEach, and...

push all non-matches in a new array, and at the end of forEach replace the original array with the new one.

Or, using standard for loop...

iterate over the array index and decrease the loop index i-- whenever you find a match

== more solutions ==

https://stackoverflow.com/questions/40700582/how-to-remove-objects-from-array-when-splice-reorders – create a filter function
[https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/filter MDN - Array.prototype.filter]

= Improve the way the script hides =

: Seems like you could hide each entire search result and then unhide the element of interest, which is the pagename. --Izno (talk) 13:17, 29 September 2017 (UTC)

= Get rid of the extraneous linefeeds =

The search results are double spaced, which shows up as a blank line between each list item when you cut and paste to an edit window.

First, it might help to be able to see the control characters (like linefeed, \n). One way to look for them is with this:

// Inspect the raw text, so you can look for \n linefeeds

$(".mw-search-results").each(function(index) {

let mwsr_text = JSON.stringify($(this).text());

alert(mwsr_text);

});

This showed the text, but didn't show the linefeeds (\n). Logically, they must be there. The linefeed characters don't show up in the editor I cut and pasted them into. But the editor's search/replace is still able to find/replace them. Therefore, it might be possible to use regex in JS to get rid of them on the web page.

So, I tried the following code to remove linefeeds (\n), but it didn't work.

var str = $(".mw-search-results").html();

var regex = /\n/gi;

$(".mw-search-results").html(str.replace(regex, ""));

I tried it on \s, and it got rid of the linefeeds along with all the other white space characters. Which means they may be specifically accessible.

= Discussions =

Post messages below.

Script to format search results as a list of page names with bullet list wikicode provided

: (Originally posted to User:Evad37).

I've written a script called StripSearch.js that unclutters search results to make them bare lists of page names. [Editor's note: The name was later changed to StripSearchSimple.js].

Now I'm writing a sequel to it called StripSearchInWikicode.js.

I would like the output of search results to look like this:

* [[Benjamin Franklin]]

* [[Larry Page]]

* [[Carl Sagan]]

* [[Hillary Clinton]]

* [[Warren Oates]]

...for easy copying and pasting into articles.

I'm having trouble manipulating the elements of class "mw-search-result-heading".

I gather that you put them into an array like this:

var x = document.getElementsByClassName("mw-search-result-heading");

I'd like to subject the items in that array to a regex, using the jQuery .each method, or the .each function, but I don't know how. The documentation is confusing as hell.

I think the search string () and replacement string * $1 ought to work.

Any pointers would be most appreciated.

Sincerely, The Transhumanist 12:58, 29 September 2017 (UTC)

:{{ping|The Transhumanist}} You don't really need anything that complicated – you can just insert content before and after each element with class "mw-search-result-heading" using jQuery's [http://api.jquery.com/prepend/ prepend] and [http://api.jquery.com/append/ append] methods:

:$(".mw-search-result-heading").prepend('* ').append('');

:just about does the trick. - Evad37 [talk] 13:51, 29 September 2017 (UTC)

::Or even better

::$(".mw-search-result-heading").children().before('* ').after('');

::(this avoids leaving a space before the ]]) - Evad37 [talk] 13:53, 29 September 2017 (UTC)

::: You are right, the first method would be perfect if it didn't insert an extraneous space.

::: The second method inserts * unexpectedly on the same line after various entries, like this (searched for "genre"):

:::: * Genre

:::: * Genre art

:::: * Rapping *

:::: * Pop music *

:::: * Trap music *

::: Is there a way to apply regex, to avoid both problems?

::: Another feature I would love the script to have is to strip redirected entries out of the search results. Those are the mw-search-result-heading entries that include searchalttitle inside their divs. I would like to remove just those instances of mw-search-result-heading.

::: Adding that feature would probably also solve the bug in the second method you presented above.

::: Would .not work for this, to hide divs with the class mw-search-result-heading except for those that do not contain searchalttitle?

::: Unfortunately, I don't know how to apply regex to facilitate matches for this type of thing. I can construct regex strings, I just don't know how to put them into play.

::: Forgoing jQuery, I think a for loop could be set up like this:

:::

// Strip out redirected entries

var x = document.getElementsByClassName("mw-search-result-heading");

for (i = 0; i < x.length; i++) {

// somehow remove this entry if

// it contains element of class "searchalttitle"

}

::: But I don't know how to write the guts.

::: By the way, the script failed when I ran it with that empty for loop, and it failed when I tried sorting the array, like this:

:::

// Sort the search results

var x = document.getElementsByClassName("mw-search-result-heading");

x.sort();

::: It's enough to make one's head spin. :) The Transhumanist 20:50, 29 September 2017 (UTC)

::::Loops and regex aren't always the best tools, especially when working with collections of elements. jQuery has several ways to filter and refine results. One way would be to only apply * to the first-child elements within .mw-search-result-heading like so:

::::$(".mw-search-result-heading").children().filter(':first-child').before('* ').after('');

::::Another way, like you alluded to above, is to first remove the searchalttitle elements, and then the * can be added safely:

::::$(".searchalttitle").remove();

$(".mw-search-result-heading").children().filter(':first-child').before('* ').after('');

::::Or to remove instances of mw-search-result-heading which contain searchalttitle you can use .has():

::::$(".mw-search-result-heading").has(".searchalttitle").remove();

$(".mw-search-result-heading").children().before('* ').after('');

::::Which can also be written slightly more succinctly like so:

::::$(".mw-search-result-heading").has(".searchalttitle").remove().end().children().before('* ').after('');

::::Note that you can use .hide() instead of .remove() if you want to be able to show those elements again at some point. - Evad37 [talk] 02:52, 30 September 2017 (UTC)

{{od}} Wow. You make it looks so easy. So, you chain methods to a selector. Nice. That sure is convenient. jQuery is simpler than I thought. When you chain methods to a class, they work on all the elements of that class. I was doing that with hide, but was just copying the examples and didn't really grasp the underlying structure. Thank you. And on retrospect, with loops and regex, it looks like I was trying to conduct surgery with an icecream scoop. :)

I try to follow along in the documentation during these discussions, so that I can grasp the jargon. While doing so, I noticed this:

$(".searchalttitle").remove();

$(".mw-search-result-heading").children().filter(':first-child').before('* ').after('');

can be refactored to this:

$(".searchalttitle").remove();

$(".mw-search-result-heading").children(':first-child').before('* ').after('');

It seems to work!

The script is now operational, thanks to you. But, I came across an unforeseen obstacle. The results look great on the search results page, but when you copy and paste them into an edit page, there is a blank line between all the entries. That requires that the user regex them all out in WikEd. I'd like to eliminate that manual operation by removing the blank lines in the search results.

Also, when we remove the .mw-search-result-heading entries that contain .searchalttitle, additional blank lines are left behind. Is that a clue that can help us track those newlines (\n) down?

It is not apparent where the newlines are inserted in the page source for the search results page. So, I assume they are specified on a style sheet somewhere. What is the most effective way to hunt down the style sheet which defines a particular class used on a Wikipedia page? The Transhumanist 21:24, 30 September 2017 (UTC)

:It all seems to be very much browser dependent. Chrome gives me the expected result:


There is a page named "Genre" on Wikipedia

Genre
Yuri (genre)
Film genre
Literary genre
Harem (genre)
Genre studies
Music genre
Western (genre)
Bara (genre)
Genre fiction
Biblical genre
Epic (genre)
Genre art
Thriller (genre)

:Firefox adds spaces at the start of each line:


There is a page named "Genre" on Wikipedia
* Genre
* Yuri (genre)
* Film genre
* Literary genre
* Harem (genre)
* Genre studies
* Music genre
* Bara (genre)
* Western (genre)
* Genre fiction
* Biblical genre
* Epic (genre)
* Genre art
* Thriller (genre)

:IE adds several newlines between each item:


There is a page named "Genre" on Wikipedia

Genre


Yuri (genre)


Film genre


Harem (genre)


Literary genre


Genre studies


Music genre


Bara (genre)


Western (genre)


Genre fiction


Biblical genre


Epic (genre)


Thriller (genre)


Genre art

:That's all on windows 7. And you're presumably using some other browser/OS combination. Not really sure what the solution is though. - Evad37 [talk] 03:28, 1 October 2017 (UTC)

:: Since the removed items each leave behind a newline, my guess is that it's one newline per div. But what div? There is other formatting there, including alternating background colors, and a solid border between entries. If I can remove the divs that the removed entries were in, that might get rid of some of the extraneous new lines. The rest I won't know until I get a look at the style sheets. But I can't find the style sheets. Is there a way to trace a class back to the style sheet it is defined on? The Transhumanist 04:54, 1 October 2017 (UTC)

:: I got rid of the blank lines for the removed items by changing one of your lines of code to this:

// nuke "li" instead of ".mw-search-result-heading"

$("li").has(".searchalttitle").remove();

:: I'm wondering why the double spacing (extra newline) between list items doesn't show up in the page source. In WikEd, newline characters ("\n") are invisible, but its regex feature finds/replaces them anyways. Maybe the same concept can be applied. The Transhumanist 05:40, 1 October 2017 (UTC)

:: I tried this to get rid of each \n, and it didn't work:

var str = $(".mw-search-results").html();

var regex = /\n/gi;

$(".mw-search-results").html(str.replace(regex, ""));

:: But then I tried it on \s instead, and it got rid of the extra linefeeds (along with all other white space, turning the entries to mush -- separated list items of mush! This shows that the extraneous linefeeds are potentially specifically accessible.). Any ideas? The Transhumanist 08:35, 1 October 2017 (UTC)

:::Tracing styles: A lot of browsers have Web development tools ("dev tools" or "inspectors" or similar) that can show what styles an element currently has, and where they come from (e.g. in Chrome you can right-click on an area you're interested in and select Inspect).

:::Regex: \s is equivalent to [\r\n\t\f\v ], so one of those should work. There are various regex-testing website you could use to test, analyse, explain, and experiment with regex patterns – I use https://regex101.com/ (just need to make sure the 'flavor' is javascript), but there are others out there.

:::{{gi|I'm wondering why the double spacing (extra newline) between list items doesn't show up in the page source.}} – Since I didn't have the problem with Chrome on Win 7, and FF/IE had different problems to what you're describing, I think its basically down to either browser bugs {{small|(or "features")}} – possibly MediaWiki is serving up (or the JavaScript modification is making) non-standard/non-compliant code, and the browsers have to decide for themselves how to handle it (thus some insert phantom spaces, others don't). - Evad37 [talk] 13:21, 4 October 2017 (UTC)

Fixing the doublesspacing problem, and sorting it too

User:The Transhumanist/StripSearchInWikicode.js – the recent script you helped me on, which strips WP search results down to a bare list of links, and inserts wikilink formatting for ease of insertion of those links into lists. This is useful for gathering links for outlines. It still has the interlaced CR/LFs problem. Aside from that, I'd like this script to sort its results. So, if you know how, or know someone who knows how, please let me know.

: The Transhumanist, I've had a thought on how to fix the stripsearch script: What it should do is make an array containing the search result titles - which can be sorted and otherwise manipulated using standard array methods - and then remove all the search result stuff, and rebuild the links from the array in the format you want. jQuery's .map() or .get() functions should be able to make the array. - Evad37 [talk] 00:34, 27 October 2017 (UTC)

:: Thank you for the guidance. How would you "rebuld the links from the array"? The Transhumanist 02:19, 27 October 2017 (UTC)

:::You can make links from page titles using code like I've got in User:Evad37/extra.js's makeLink function. But in your case you need to also surround the link with * and , and have the whole thing within a block tag like <div> or <p>. Do that for each item in the array, and then you can add them all to (or next to) an element on the page using a jQuery method like .before(), .after(), .prepend(), or .append(), each of which can take an array as the input. - Evad37 [talk] 02:40, 27 October 2017 (UTC)