JL-Bot

Archives

Archive 1	Archive 2	Archive 3
Archive 4	Archive 5	Archive 6
Archive 7	Archive 8

This page has archives. Sections older than 14 days may be auto-archived by .

Not sure this is configured correctly

Latest comment: 3 years ago13 comments3 people in discussion

On this page:

Portal:Scouting/Recognized_content

It looks like a couple of topics are repeated, and on (the DYK section) isn't list at all. How can I fix this? --evrik ^(talk) 16:25, 7 June 2022 (UTC)Reply

@Evrik: Do you have examples of things that are repeated/not listed? Headbomb {t · c · p · b} 22:30, 7 June 2022 (UTC)Reply

For the DYK section, there was an extra equals sign on the end of the line. I removed that and re-ran. The DYK section is now present. I'm not seeing any duplicate topics. Let me know what you are questioning and I will take a look at it. -- JLaTondre (talk) 01:06, 8 June 2022 (UTC)Reply

@JLaTondre:, thank you for the fix. @Headbomb:, "Picture of the day pictures" seems to duplicate "Featured pictures", "Featured lists" mirrors "Main page featured lists" are the two with the most overlap. --evrik ^(talk) 01:22, 8 June 2022 (UTC)Reply
Well that's because they're different things.
I suppose some condensed output could be something like
- Neil Armstrong ( 2019-07-21)
Could also be done. Headbomb {t · c · p · b} 03:00, 8 June 2022 (UTC)Reply

That looks good. How would I code that on the page? --evrik ^(talk) 03:08, 8 June 2022 (UTC)Reply

JLaTondre would first have to code support for that, and then the instruction would be at WP:RECOG if/when that's implemented. Headbomb {t · c · p · b} 05:04, 8 June 2022 (UTC)Reply

The bot currently outputs by category. There is overlap in the categories (for example, I would assume all "Main page featured lists" are "Featured lists"), but they are not the same (not all featured content appears on the main page). So what is the ask here?

When outputting a featured type (ex. "Featured lists"), provide an option to add a Wikipedia icon if it appeared on the main page?
1. It would then be up to specifier to only include the one category type and the option; or
2. The bot would only display the larger category if the option is set even if both categories are specified
Provide an option to consolidate by page type (article, list, picture, sound) where it would show a different icon for each recognized type? So for example, the new section could be "Recognized articles" and you would get a different icon and date (where applicable) for each of "Featured article", "Former featured article", "DYK", etc.

-- JLaTondre (talk) 11:55, 8 June 2022 (UTC)Reply

Well, TFAs should be either current or former FAs. So a condensed option would, IMO, required that both FA and FFA are covered. And then the TFAs could be 'merged' into FA/FFA. Likewise for TFLs, which would be merged into both FL and FFL sections. So if you have

 |content-featured-articles
 |content-former-featured-articles
 |content-mainpage-featured

the output would be as is, but if you had something like

|content-featured-articles
|content-former-featured-articles
|content-mainpage-featured=condensed (or something equivalent)

then the output would be merged as above (or similar, depending on whether or not icons were desired) Headbomb {t · c · p · b} 12:38, 8 June 2022 (UTC)Reply

@JLaTondre: if you run the bot on Portal:Scouting/Recognized_content I can do a full mockup. Headbomb {t · c · p · b} 01:55, 12 June 2022 (UTC)Reply

Done. -- JLaTondre (talk) 15:40, 12 June 2022 (UTC)Reply

Mockup. POTD icons will be supported once Template_talk:Icon#POTD_support is enacted. Headbomb {t · c · p · b} 16:32, 12 June 2022 (UTC)Reply

@JLaTondre and Headbomb: Thank you both! --evrik ^(talk) 02:33, 13 June 2022 (UTC)Reply

FM captions

Latest comment: 3 years ago4 comments2 people in discussion

Is there a reason why the "caption" option actually displays the media's title rather than its caption? The titles are so rarely helpful, while the captions would definitely be! MeegsC (talk) 17:38, 7 August 2022 (UTC)Reply

Primarily performance, but also the lack of standard formats. Captions are on the pages that use the images and it would be add significant time to go pull them. The task already takes most of a day to run. Images can also appear on multiple pages with significantly different captions. Captions are not always in a standard format which makes pulling them from the page text problematic. -- JLaTondre (talk) 12:00, 9 August 2022 (UTC)Reply

Could we not use the captions that are in the picture file, and default to the title only if the picture file doesn't have an English caption? Most file captions are better than the title! MeegsC (talk) 13:14, 9 August 2022 (UTC)Reply

The description field? Yes, that might work. I will look into it. -- JLaTondre (talk) 16:36, 13 August 2022 (UTC)Reply

Highlight journal= from different character set

Latest comment: 14 days ago16 comments2 people in discussion

If, for example, you have |journal=Аcta Вaltico‑Slavica, where А and В comes from the Cyrillic alphabet and the others from the Latin alphabet, it would be useful in the complilation to highlight this sort of thing, i.e. when an entry has characters from two different alphabets. If it's from a single alphabet, no highlighting is needed.

Journal¹	Type²	Target¹	Type²	Citations	Articles	⁠Citations/article⁠	Search
Аcta Вaltico‑Slavica	?	Acta Baltico-Slavica	?	1	1	1.000	Wikipedia _(J·M·T) Google _(J·M·T)

In general, there could be a color scheme like

Red = Latin
Orange = Arabic
DarkKhaki = Chinese
Green = Cyrillic
Blue = Greek
Indigo = Hebrew
Violet = Japanese
DeepSkyBlue = Other1
MediumPurple = Other2 (only used when Other1 is already used)
DeepPink = Other3 (only used when Other2 is already used, might not be needed)

Would this be difficult to implement? Headbomb {t · c · p · b} 05:30, 27 August 2025 (UTC)Reply

Yes, that is doable. Perl, which is what that part of the citation processing is written in, makes it easy to check language scripts. Perl can recognize all the ones listed at perlunicode#Scripts (all the ones you are requesting are on that list). For Chinese, it would really be detecting for Han script - which in my understanding is used for several Eastern languages. It will probably be a couple of weeks before I can complete it. -- JLaTondre (talk) 23:14, 27 August 2025 (UTC)Reply

Yes, if it's the Han alphabet, then that's the character set that should be highlighted. The point is to detect names that have multiple character sets in them, which should be rare, and usually limited to case like |journal=The Journal of Things = Το ημερολόγιο των πραγμάτων.

It's probably simpler to collect them and have them all reported on their own WP:JCW/Multiscript subpage, with that highlighting only in effect on that page.

Headbomb {t · c · p · b} 00:59, 28 August 2025 (UTC)Reply

A separate page is easier. I can have a separate script for that vs. integrating into the regular output. -- JLaTondre (talk) 23:37, 28 August 2025 (UTC)Reply

Should it report cases where there is a language template? For example, what should it do with Sidirotrohia ({{langx|el|Σιδηροτροχιά}}) which will produce Sidirotrohia (Σιδηροτροχιά) (after the change discussed below)? There are also cases where people enter titles in multiple languages without the use of a template? Should it only report a mismatch when it happens within a single word? -- JLaTondre (talk) 23:53, 28 August 2025 (UTC)Reply

If there's a language template, that can be ignored IMO. I suppose to start, mismatches could happen accross multiple words, this way it could catch things like Acta Whatever А. Headbomb {t · c · p · b} 00:12, 29 August 2025 (UTC)Reply

I have the code to detect multiple scripts completed. It is returning 2,325 cases in the last dump. The majority are of the format of a single non-Latin script followed by Latin script (or vice versa). For example:

한국한문학연구 (Korean Literature Research)
한국전자통신학회 논문지 = the Journal of the Korea Institute of Electronic Communication Sciences
한국언어문화 [Journal of Korean Language and Culture]
ЕтноАнтропоЗум / EthnoAnthropoZoom
Езиков свят - Orbis Linguarum
Military History Studies (军事历史研究)
Linguistic Sciences 语言科学
Acta Historica: Труды По Историческим И Обществоведческим Наукам
7iber | حبر

Should I exclude cases where it is a single non-Latin script + space or punctuation + a Latin script (and the reverse order)? It seems like these are valid cases and you are more interested in ones like Artanіya which has a Cyrillic і in the middle of Latin characters? -- JLaTondre (talk) 14:06, 13 September 2025 (UTC)Reply

I think Script 1 [Seperator] Script 2 can be excluded without much loss, so long as each script have 3+ letters in them. This way it excludes Езиков свят - Orbis Linguarum, but includes Acta Whatever А. Headbomb {t · c · p · b} 15:06, 13 September 2025 (UTC)Reply

Code is complete, except for saving to Wikipedia. I uploaded the first 200 results at Wikipedia:JCW/Multiscript1. Please review and let me know what you think. If all is good, I will automate the saving (including have a legend for the colors). -- JLaTondre (talk) 15:24, 20 September 2025 (UTC)Reply

Looks good as far as I can tell. I might have refinements down the road, but I'll need to actually work with it for a bit to know. Headbomb {t · c · p · b} 16:12, 20 September 2025 (UTC)Reply

Saving has been implemented and the results have been uploaded. Take a look and see if everything looks good. The color legend in {{JCW-bottom-scripts}} is filled in by the bot along with database dump date. That way, if there are ever changes in the colors, the legend will be automatically updated by the code changes. If you want any changes to the legend, let me know and I will incorporate it into the bot. I will leave it to you to add the multipscript pages to {{JCW-Main}} so they are listed where you want them.

The 20th dump completed today. The entire processing, including the multiscript, should happen tonight. So that will verify the whole chain works. -- JLaTondre (talk) 19:02, 21 September 2025 (UTC)Reply

Awesome. I've cleared so many issues it'll be good to have a new dump. It won't reflect what I fixed after the 20th, but still. Headbomb {t · c · p · b} 20:38, 21 September 2025 (UTC)Reply

BTW, I moved them to .../Maintenance/Multiscript[1-4]. Headbomb {t · c · p · b} 20:48, 21 September 2025 (UTC)Reply

I notice Hiragana = Violet, Katakana = Violet, is Kanji not a distinct set of characters? Headbomb {t · c · p · b} 21:11, 21 September 2025 (UTC)Reply

Right, they're probably from the han set.... nevermind. Headbomb {t · c · p · b} 21:16, 21 September 2025 (UTC)Reply

I updated the code for the new location & verified it worked. -- JLaTondre (talk) 21:56, 21 September 2025 (UTC)Reply

Could you add..

Latest comment: 14 days ago5 comments2 people in discussion

Unbalanced brackets? Like any entry like Quart. J. Math (Oxford.

Thinking <>, [], (), {}, ‹›, «», ⟨⟩, ≪≫

Thinking they could be added as a second line to WP:JCW/Invalid. Headbomb {t · c · p · b} 15:07, 9 September 2025 (UTC)Reply

I will do after the above items. -- JLaTondre (talk) 22:27, 9 September 2025 (UTC)Reply

I implemented a new regex matching option for Patterns using a {{JCW-regex}} option via the config. This allowed me to relatively easily add the unbalanced brackets|parenthesis|etc. case and an all punctuation case (to catch the "." citations mentioned above). Instead of adding special processing, this more general implementation allows for other uses down the road.

The results are uploading to WP:JCW/PAT now. For the all punctuation case, there are more than 5 articles for the period case and for a "<>" case. That makes the results a little less useful since the articles don't get listed. I can just clean-up those cases myself (since it is easy enough for me to manually pull the article list). I figure once the current ones are fixed, there is likely to be less than 5 per dump.

I still need to create the {{JCW-regex}} page so it renders on the config page. -- JLaTondre (talk) 19:35, 20 September 2025 (UTC)Reply

Sounds great! Headbomb {t · c · p · b} 22:34, 20 September 2025 (UTC)Reply

{{JCW-regex}} has been created. I did it as a regular template. I know {{JCW-pattern}} uses a Lua module. If that is better, I will leave it for you to convert as I'm not familiar with those. As the 20th dump completed today and will be processed tonight, I will cleanup the punctuation only cases after those results post. -- JLaTondre (talk) 19:07, 21 September 2025 (UTC)Reply

Bug

Latest comment: 11 days ago9 comments2 people in discussion

In WP:JCW/Multiscript1, there's a latin only entry "Das kulturelle Erbe von Arzach". If you go to the article, you have |journal=Das kulturelle Erbe von Arzach | Արցախի մշակութային ժառանգությունը: Armenische Geschichte und deren Spuren in Berg-Karabach | Հայոց պատմությունը և դրա հետքերը Լեռնային Ղարաբաղում

The bot should be able to recognize {{!}} Headbomb {t · c · p · b} 22:41, 21 September 2025 (UTC)Reply

If you edit the Multiscript1 page, the full annotated name is: | : | . It needs the |'s escaped when placed into the template so it doesn't break the template. I will fix that. -- JLaTondre (talk) 01:12, 22 September 2025 (UTC)Reply

Right, I dug too deep. At least looking silly isn't as dangerous as Durin's Bane. Headbomb {t · c · p · b} 02:22, 22 September 2025 (UTC)Reply

Fixed. Let me know if you see anything else wrong. -- JLaTondre (talk) 23:09, 22 September 2025 (UTC)Reply

The only thing I see is a nitpick about brackets sometimes being the wrong color, like in Dragon Ball 大全集 2: Story Guide [Dragon Ball Complete Works 2: Story Guide] or Hōsō Kenkyū to Chōsa [NHK Monthly Report on Broadcast Research] 放送研究と調査 or Lj. Maksimović, M. Ricl, ΤΗ ΠΡΟΣΦΙΛΕΣΤΑΤΗ ΚΑΙ ΠΑΝΤΑ ΑΡΙΣΤΗ ΜΑΚΕΔΟΝΙΑΡΧΙΣΣΗ: Students and Colleagues for Professor Fanoula Papazoglou (International Conference, Belgrade, October 17–18, 2017) . Headbomb {t · c · p · b} 02:07, 23 September 2025 (UTC)Reply

Punctuation and numbers are valid in multiple scripts. Perl classifies them as Common script. The code is leaving them uncolored if between other scripts and grouping them with a script if between two ranges of the same script. Many times that makes sense, but not always. The other option is to never color them. Should I change it to that? -- JLaTondre (talk) 10:26, 23 September 2025 (UTC)Reply

Yeah, probably best to leave brackets, numbers, and other 'neutral' marks in black. Headbomb {t · c · p · b} 11:56, 23 September 2025 (UTC)Reply

Change made & updated results uploaded. -- JLaTondre (talk) 22:22, 24 September 2025 (UTC)Reply

Looks good! Headbomb {t · c · p · b} 23:35, 24 September 2025 (UTC)Reply

Add topic