Commons:Requests for comment/Technical needs survey

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

This survey is not finalized yet method or timeline might change.

Background[edit]

Commons is facing many technical problems in the way of bugs and broken tools or needed missing features. In September 2022 the Commons:WMF support for Commons started working on some of these. A recent discussion on the Village pump showed that we never really decided what we as Commons users need to most. This survey should fill this gap and result in a priority list of the most urgent problems.

Many of this was already discussed with the Open letter of 2022: Commons:Think big - open letter about Wikimedia Commons.

Method[edit]

For making this survey we use the same method as the annual m:Community Wishlist Survey of the WMF on Meta and de:Wikipedia:Technische Wünsche of Wikimedia Germany are using.

Timeline[edit]

  • Until 24 December 2023 discuss the procedure of this survey and change it if needed (proposals can already be made but might need to become adjusted later)
  • Until 14 January 2024 submit and discuss proposals
  • 15 January 2024-21 January 2024 clustering and merging of proposals if needed
  • 22 January 2024-15 February 2024 vote on the proposals

Resulting list[edit]

During proposal and voting all proposals are treated the same but after the voting there will be two separate lists. One list for fixing existing functionalities and tools and one list for the requested new features. Please consider this when creating proposals and split fixing and the request for new features for one tool into two proposals.

Proposals[edit]

Use the box below to create a proposal:

Confirm .mp4 for 2028[edit]

Description of the Problem[edit]

  • Problem description:
The most popular video format is .mp4 ((MPEG-4 Part 14 (Q336316), en:MP4 file format), but unfortunately it is a closed format under patent until 2028. Please confirm that technical and legal plans are in place to begin accepting and playing videos in this format as soon as possible.
  • Proposal type: feature request
  • Proposed solution:
Give legal and technical confirmation to the Commons community that .mp4 files are acceptable for Commons upload from 1 January 2028.
Review previous discussions
  • Phabricator ticket:
  • Further remarks:


Bluerasberry (talk) 01:03, 9 January 2024 (UTC)Reply[reply]

Discussion[edit]

 Info I think this topic is a bit complicated. MP4 is rather a container format. If we want to upload free MP4 files, we have to examine what codecs were used, and when the protection is waived. We have H.264 and HEVC recently; it would take some time until the protection is waived --PantheraLeo1359531 😺 (talk) 19:26, 9 January 2024 (UTC)Reply[reply]

Support textured meshes on Commons[edit]

Description of the Problem[edit]

  • Problem description:

The topic was raised a few times before, but as it is very important to me, I want to put it on the schedule.

Commons now supports meshes as STL file. STL files do not support textures. But for some motifs 3D models, textures are really necessary or mandatory. An apple as only shape is not really useful, but with a texture, it is.

  • Proposal type: bugfix / feature request / process request

Feature request

  • Proposed solution:

Adding a new free file extension to Commons, like fbx.

  • Phabricator ticket:

phab:T246901

Discussion[edit]

  •  Support The earlier that is done the better. People upload untextured STL files where textured 3D models are available so this is a lot of redundancy and extra work building up as well as still missing out on textured 3D models vs grey bad ones. Asked about this here recently before I found the code issue. Maybe not a top priority issue but close to that and good to fix asap. Then, please also create a proper info page about 3D models on WMC. --Prototyperspective (talk) 10:36, 5 January 2024 (UTC)Reply[reply]
  •  Support Per the nominators rationelle, the new file type would benefit Commons in adding more quality content to the project. Johnson524 (talk) 14:42, 5 January 2024 (UTC)Reply[reply]
  •  Question what do you mean by "motifs"? I suspect this is a false cognate from some other languages, the English meaning of "motifs" makes no sense here. - Jmabel ! talk 20:04, 5 January 2024 (UTC)Reply[reply]
 Info I changed the word :) --PantheraLeo1359531 😺 (talk) 17:25, 6 January 2024 (UTC)Reply[reply]

Metadata editing tool[edit]

Description of the problem[edit]

  • Proposal type: feature request
  • Proposed solution: a tool that can:
  1. edit exif
  2. not cause corruption to files
  3. be lossless editing if possible? if not, it should still minimise the change in quality and/or filesize.
it'd be great if such a tool can be developed and hosted like the croptool or rotatebot.
maybe something already exists on the internet.--RZuo (talk) 06:44, 4 January 2024 (UTC)Reply[reply]
  • Phabricator ticket:
  • Further remarks:

Discussion[edit]

  •  Support.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 13:09, 4 January 2024 (UTC)Reply[reply]
  •  Oppose Not one of the most critical features, just a very useful one.
It would be nice if one could just tick a "remove exif data" checkbox. There are tools for that that allow you to do so for many images at once and a gadget could be possible too since most people don't need it and exif data could provide additional information. --Prototyperspective (talk) 13:53, 4 January 2024 (UTC)Reply[reply]
A feature for removing location metadata could also be useful - some people are sensitive about revealing their locations.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 14:00, 4 January 2024 (UTC)Reply[reply]
I think that would be better than editing exif data as the latter would only result in more hard-to-detect faulty metadata (e.g. mainly create new problems rather than solve any tangible ones). Just giving the people the ability to remove exif data (all or location data) would probably be better. Prototyperspective (talk) 14:04, 4 January 2024 (UTC)Reply[reply]
Unless I am missing something, exif data could be easily removed by using image editing software, on the user's computer or smartphone. If the image has already been uploaded to Commons, it could be overwritten with the version without exif data, and then asking for the previous version to be deleted. I think that general image editing features are out of Commons scope. MGeog2022 (talk) 14:36, 4 January 2024 (UTC)Reply[reply]
If it really isn't easily possible, I think this should be proposed to developers of some image editing free software, such as GIMP, since it would be very useful to many people outside Commons. MGeog2022 (talk) 14:41, 4 January 2024 (UTC)Reply[reply]
When you export a file with GIMP you can simply uncheck exif data. Please check in advance, for example by doing a Web search if you don't have that software. It's also possible to remove exif data of all files in a directory. Removing the exif data at upload would make it a) more convenient especially if you only use this rarely and would like to use this for only some but many images b) more accessible to users. I already argued that it's not an important issue unlike several other proposals and would only be useful to have at some point. Prototyperspective (talk) 15:49, 4 January 2024 (UTC)Reply[reply]
@Prototyperspective, OK, it seemed a bit odd to me to have something similar to image editing features here, but I won't oppose any purposal that is useful (reading the comments above, I thought almost no software could easily remove exif data without creating problems, that's why I said to ask for this feature to be included in software such as GIMP). MGeog2022 (talk) 20:25, 4 January 2024 (UTC)Reply[reply]
  •  Comment I think there is a broader question here: how much do we want to provide by way of file-editing tools for files that have already been uploaded? So far, the I believe the only ones we have are crop & rotate. (Am I missing something?) I'm not sure if editing EXIFs is the very next priority (some sort of contrast manipulation might be a higher priority). Plus, it isn't clear to me that this particular need arises all that often. It isn't all that hard to download-fix-reupload. - Jmabel ! talk 19:50, 4 January 2024 (UTC)Reply[reply]
  •  Comment Also: if this is specific to EXIF, let's say "EXIF" not "Metadata". "Metadata" can mean a lot of things, including SDC and everything in the wikitext of a file page. - Jmabel ! talk 19:54, 4 January 2024 (UTC)Reply[reply]
    Well.... Let's say embedded metadata. Because the metadata box at the end of the page can be EXIF, XMP, IPTC, file native and probably at least one other metadata format that I forgot about. —TheDJ (talkcontribs) 10:06, 5 January 2024 (UTC)Reply[reply]
  •  Info I think it would be very useful to look into the metadata of files that were uploaded, but not published. So you can check before uploading in a simple way --PantheraLeo1359531 😺 (talk) 09:48, 5 January 2024 (UTC)Reply[reply]

"Building block" tool to select files[edit]

Description of the Problem[edit]

  • Problem description: We have several tools for Commons maintenance that, as part of their action, select files or file pages (and in some cases, where it makes sense, category pages) to be acted upon. Each currently has its own selection mechanism. Each provides methods of selection that would be useful in the others.

Tools involved include at least [feel free to edit]:

  • VFC:
    • Can select from a category? Yes
    • Can select from a search result? Yes
    • Can select subcategory pages to be acted upon? No, even though at times it would make sense (e.g. to notify for a mass CfD.)
    • Can select file pages to be acted upon? Yes
    • Further remarks: can be a bit slow to load files for selection (which it does in batches of 100). Has some good methods to hover and see information about a file. Easy to open any given file in a new window during selection.
  • Cat-a-lot:
    • Can select from a category? Yes
    • Can select from a search result? Yes
    • Can select subcategory pages to be acted upon? Yes
    • Can select file pages to be acted upon? Yes
    • Further remarks: Uses the regular cat page loading to allow files/subcats to be chosen, which is a lot quicker to load than VFC, but does not allow the hovering for information; also means you can only act on the content of one page at a time (200 files).
  • massrename:
    • Can select from a category? Yes
    • Can select from a search result? No, and this would certainly make sense
    • Can select subcategory pages to be acted upon? No, and this would certainly make sense
    • Can select file pages to be acted upon? Not really. Acts on all filenames that match the regex, and there is no convenient way to say "except these." In practice, you can use VFC or Cat-a-lot to add a temporary maintenance category to the files you want to act on, then act on them with massrename, then use VFC or Cat-a-lot to remove the temporary maintenance category
    • Further remarks: Provides a regex-based selection of files that would be nice to be able to combine with other methods of selection.
  • AWB:
    • Can select from a category? Yes
    • Can select from a search result? Yes
    • Can select subcategory pages to be acted upon? No, and this would certainly make sense
    • Can select file pages to be acted upon? Yes, but not visually. Acts on all pagenames that match the regex, and there is no convenient way to say "except these," without using the list management features or deleting pages from the list. In practice, you can use AWB, VFC, or Cat-a-lot to add a temporary maintenance category to the pages you want to act on, then act on them with AWB, then use AWB, VFC, or Cat-a-lot to remove the temporary maintenance category
    • External tool, maintained on English Wikipedia. Runs on the local computer using Windows Vista and later. Very powerful using No Limits Plugin, so right to use it on Commons must be requested at COM:RFR#AutoWikiBrowser access, and rights to use it most effectively on Commons must be requested at COM:BRFA (to run unattended in bot mode) and COM:RFA (to delete files in Admin mode).
  • Others?
  • Proposal type: feature request
  • Proposed solution:

We could have a common "building block" or component that would embrace all of the current methods of selection. Tools could then be re-implemented to take advantage of that. Or, each of the existing selection mechanisms could be abstracted, with a clean interface providing a list of selected files, allowing "mix and match" for selection method and what tool you are using.

  • Phabricator ticket:
  • Further remarks:

Not all methods would necessarily make sense for all tools, so there needs to be some ability to turn features on and off when using the building block.

Discussion[edit]

Proposed by - Jmabel ! talk 01:02, 4 January 2024 (UTC)Reply[reply]

i think, cat-a-lot can be understood as being capable of "selecting from a search result". you can do that on a special:search page.
this building block mechanism would be really useful. it could be used for new tools like to licence review files from the same batch. RZuo (talk) 06:53, 4 January 2024 (UTC)Reply[reply]
Corrected Cat-a-lot + search. - Jmabel ! talk 20:06, 4 January 2024 (UTC)Reply[reply]

@Jeff G.: I don't use AWB, so this is a question from pure ignorance. For AWB, in the bullet point "Can select file pages to be acted upon?", you mention using AWB itself to add a temporary category to indicate selection for AWB. If it can do that at all well, why can't you use the same approach just to select the file pages to be acted upon? - Jmabel ! talk 19:46, 5 January 2024 (UTC)Reply[reply]

@Jmabel: One can use multiple passes with different tools to do different things. For instance, I once had a request special:diff/806250451 to fix the spelling "univeristy" as highlighted in this search, which showed many files including that in the filename, as well as many files including that in the wikitext. I was able to fix the filenames as described at "second renaming job" on User:Jeff G./massrename. For the wikitext, it wasn't just in file description pages, it was also in namespaces 0, 12, 14, 100, and 106, so VFC was out. I didn't categorize with that portion of that job, but I could have; instead, I just managed the internal list, making sure not to "fix" the request on my user talk page or any other talk pages, noticeboards, or archives. I documented the result at special:diff/807068043. So yes, AWB can add and remove cats, but it's not the best tool for doing that quickly in my tool chest. What I use it for nearly every day is purging the files in Category:Incomplete deletion requests - missing subpage and then that cat and pages which rely on it, because when I don't do that, files that shouldn't be there tend to linger there.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 19:46, 6 January 2024 (UTC)Reply[reply]
I'm not following that, probably because it's a tool I don't use. No need for us to discuss it further here, though. - Jmabel ! talk 01:01, 7 January 2024 (UTC)Reply[reply]

PD vs. URAA[edit]

Files in the public domain in many countries are at risk of deletion because of some obscure US-policy (URAA)[edit]

  • Problem description:
Files, that are in the public domain in other countries, and are uploaded to Commons, the central file repository, are at risk of deletion just because a single country has some obscure regulation, that nobody outside can fathom.
Here for example are files in the public domain in Europe, completely legitimate files for most of the world and thus for most of the Wikiverse, but because of this obscure URAA they are at risk of deletion.
  • Proposal type:
feature request / process request
  • Proposed solution:
Either auto-magically move those files (and related/in the same category) to the Wiki-Projects that legitimately use them (or could use them), or move the server to a country, that doesn't has to adhere to this strange regulation.
Definitely don't just delete those files, thus destroy free content.
  • Phabricator ticket:
  • Further remarks:

— Preceding unsigned comment added by Sänger (talk • contribs) 13:16, 2 January 2024‎ (UTC)Reply[reply]

Discussion[edit]

  •  Comment I think this proposal is out of bound what this survey can over. Moving the WMF headquarter and servers to another country with better copyright laws needs to be discussed on meta and with the WMF boards. GPSLeo (talk) 13:58, 2 January 2024 (UTC)Reply[reply]
    That's the rather massive solution,, but not the needed one. It would be sufficient, if the files at risk could be moved to a save place in some other project in the Wikiversum. It's just, that nobody in another country could fathom such completely strange stuff, and imho the WMF has a) tons of money and b) lots of employees, so they could look for a good solution for the main world outside the small USA. Grüße vom Sänger ♫ (talk) 14:08, 2 January 2024 (UTC)Reply[reply]
    This is not against your proposal itself we definitely need a solution also as the US becomes more and more authoritarian. It is just to large for this survey. The solution to just move the files to other Wikimedia Wikis does not work as the foundation:Resolution:Licensing policy that files need to comply with the US law also applies to them. GPSLeo (talk) 15:09, 2 January 2024 (UTC)Reply[reply]
    But there are clearly copyrighted pictures stored on enWP, because of the also obscure "Fair Use" stuff.
    There are pictures stored on deWP, that don't comply with some other obscure american regulations.
    So that assertion is definitely not true, exemptions are quite standard. Grüße vom Sänger ♫ (talk) 16:43, 2 January 2024 (UTC)Reply[reply]
    You could probably get away with it in other countries besides the United States that have fair use, but I don't know if there are any. With the exception of maybe Italy, but no one from Italian Wikipedia wants to embrace it. You can't really upload files to other projects under the guise of fair use if there is none in the country of question though and images would still have to comply with the URAA regardless. Since as far as I know the United States would still honor it if someone sued in an American court even if the file isn't technically being hosted there. --Adamant1 (talk) 03:54, 3 January 2024 (UTC)Reply[reply]
    Those pictures are PD everywhere but in the USA, because of some very strange regulation. Just because a single country blocks the otherwise complete legal use of those data the whole world gets no access to this free content. That's everything but the providing the essential infrastructure for free knowledge, that's restricting it to the whims of one random country. Grüße vom Sänger ♫ (talk) 13:25, 4 January 2024 (UTC)Reply[reply]
    @Sänger: It's not really "at the whims" of the United States when other countries agreed with it and were involved in drafting the law. Although even if that were the case it's not like the copyright laws of other countries don't have a similar effect. Why should I as an American have to not see or use images of works that were created and copyrighted in a country like Mexico where the term is totally ridiculous? That's life though. Even if I think waiting until 100 years after a Mexician artist has died to upload their work is totally stupid. --Adamant1 (talk) 10:05, 5 January 2024 (UTC)Reply[reply]
  •  Oppose, nothing we can do.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 13:33, 4 January 2024 (UTC)Reply[reply]

TimedText[edit]

Description of the problems[edit]

  • Problem description:
  1. need an easy/user-friendly way to categorise timedtext. beneficial for categorising based on languages, quality of transcript, etc.
  2. need an easy way to check all timedtext pages associated with a file. something similar to https://commons.wikimedia.org/w/index.php?oldid=828200732#L-166 .
  3. need a more intuitive way of going to the associated file on a timedtext page. currently it's by ctrl+click the file (the mediaplayer box), or open up the popup and click the circle i. i needed this so much that i wrote a script before i learnt the ctrl+click trick https://commons.wikimedia.org/w/index.php?oldid=828200732#L-159.
  4. a way to assess the quality of timedtext (similar to wikisource?). incomplete, transcribed, non-synchronised, proofread, verified...?--RZuo (talk) 23:59, 31 December 2023 (UTC)Reply[reply]
  5. a tool/interface that helps transcription, something like https://www.nikse.dk/subtitleedit/online .--RZuo (talk) 07:07, 2 January 2024 (UTC)Reply[reply]
  • Proposal type: feature request
  • Proposed solution:
  • Phabricator ticket:
  • Further remarks:

Discussion[edit]

  •  Oppose You did not explain why this would be useful and why there are these needs. Also 4 can already be done via file categories. Opposing for now since this so far doesn't seem to be anywhere near the most important issues and can to a large degree already be done; very many other issues would be more important and haven't been listed here. --Prototyperspective (talk) 11:17, 1 January 2024 (UTC)Reply[reply]
    can you point to me an english timedtext that's incomplete, and an english timedtext that's been proofread, based on your claim that "4 can already be done via file categories"? RZuo (talk) 11:46, 1 January 2024 (UTC)Reply[reply]
    I said it can already be done, not that it is already being done and I would encourage such to be done, especially if machine translation / auto-caption tools are leveraged for WMC multilingualism (which could be very impactful). However, I can also point you to an example: Category:Videos by Terra X with English subtitle file unchecked – these need proofreading (see the cats above for more). I think people usually just upload timedtexts that are already complete but a new category for incomplete ones would be useful.
    1. also is already being done with cats like "…with subtitles in English". Prototyperspective (talk) 16:13, 1 January 2024 (UTC)Reply[reply]
as i've tested at TimedText:Sandbox.webm.en.srt, timedtext pages can be categorised in the same way as other pages, but hotcat doesnt work on tt pages, so it's cumbersome. which is why i said we "need an easy/user-friendly way to categorise timedtext". the most basic solution is to make hotcat work on tt pages.
but traditional categorisation method is inferior to the assessment structure in wikisource, which i think is a lot easier to use (just clicking the coloured dots) and provides a standard classification.
then this reminded me of the need to have a transcription tool, because transcribing audio/video is different from a text. transcribing audio/video requires pausing the playback and setting timestamps.--RZuo (talk) 07:07, 2 January 2024 (UTC)Reply[reply]
  • Regarding 3. A patch for this is already coming —TheDJ (talkcontribs) 10:19, 2 January 2024 (UTC)Reply[reply]
  • Regarding 4. You can always just use talk pages. Just like Wikipedia uses talk pages for wiki project assessments. —TheDJ (talkcontribs) 10:17, 2 January 2024 (UTC)Reply[reply]
  • Regarding 5. Heavily suggest that this is a case of "external specialised services are better than build and maintain our own service". We used to have Amara integration and for the few years that that worked, it was pretty ok. Finding a good online editor, hosting it on Toolforge and adding a few integrations is going to be way more maintainable than trying to ram yet another component into Mediawiki. —TheDJ (talkcontribs) 10:16, 2 January 2024 (UTC)Reply[reply]

Video with multiple audio tracks[edit]

Description[edit]

  • Problem description: i was just thinking of making a video tutorial for v2c, or it could be a cooking recipe. in consideration of different languages, it would be best to make a video suitable for audio commentary in all languages. then other users can make their dubs and add them to the videos.

    here comes the problem. to add additional soundtracks to a video, the whole video has to remuxed. it's a daunting task for many people, and each new edit would mean creating new versions of a video, i.e. lots of redundant big files.--RZuo (talk) 18:40, 29 December 2023 (UTC)Reply[reply]

  • Proposal type: feature request
  • Proposed solution: additional soundtracks can be hosted separately, and during video playback, users can choose which tracks to play. that would allow playing different languages, or enjoying a movie (with the original sound) while listening to a commentary soundtrack.

    youtube is just experimenting multiple tracks https://support.google.com/youtube/thread/129769858/updates-to-captions-and-audio-features-on-youtube .--RZuo (talk) 18:40, 29 December 2023 (UTC)Reply[reply]

    i think i should clarify my concept.
    it's probably not to enable a video file with multiple soundtracks, because adding soundtracks to a video is harder?
    it's to have a video file (with or without a single soundtrack), and separate audio files that are soundtracks to this video.
    during playback, users can select which soundtrack to play, just like how users can select which timedtext file for cc now. RZuo (talk) 10:44, 7 January 2024 (UTC)Reply[reply]
  • Phabricator ticket:
  • Further remarks: copied from https://commons.wikimedia.org/w/index.php?title=Commons:Idea_Lab&oldid=836423569#Video_with_multiple_audio_tracks .

Discussion[edit]

  •  Support It's a large strain on storage space and just inconvenient and a burden on users and contributors to have separate video files for each languages rather than audio tracks. See the categories about redubbed explanatory videos in this new cat as an example of how this can be useful: Wikimedia projects and audio.
It would make the site more multilingual, improve global education, access, reduce storage requirements (example) etc. Moreover, there really should be machine translated video captions for all languages where that's feasible, people could use that for manmade caption, for example because the texts only need to be edited and are already set to the right timings. That's a separate issue though. Moreover, once a video has captions and separate audio tracks, AI-generated voice, which recently dramatically improved, could be used for auto-redubbed videos (audio-tracks per language) – sometimes also manually renarrated videos by WMC narrators – which could substantially improve global education and the usefulness of files of WMC. That's also another issue. Copied this over from Commons:Idea Lab. However, I wouldn't consider this a top important issue, just an important one where the sooner it's done the better and one where the potential benefit can be large mainly due to easily redubbed videos. --Prototyperspective (talk) 23:50, 29 December 2023 (UTC)Reply[reply]

Wall of Images View for category pages (incl sorting & subcats)[edit]

Description of the Problem[edit]

  • Problem description: Most other major comparable media websites (example) have a wall of images view on sites, just not WMC. For some reason the only way to get this here is the search results. However, there is no way to have this on category pages which are a) often more useful than using the search results and b) can be linked to and indexed by Web search engines. Moreover, it is often difficult to find good/relevant images from a category page because they are buried in (often seemingly arbitrary) subcategories (examples below) and/or among a very large number of other (often outdated) files that clutter the pages.
Categories can often be more useful than search results if the searched image is not within the results or you're looking for a specific subject that has a category / you know of (or navigated to) the category you know should contain the image. Also the search results can be sorted in a way that very low quality and irrelevant results are at the top while high-quality ones are buried far down. Often, many images are not in the results – for example because the file description does not include some specific word. Other than the search results page the UI seems rather antiqued with little regards to UX and practical usefulness. A walls of images view is especially useful in overflowing cats (containing e.g. mostly outdated charts) or cats with very long branches of subcats (e.g. by subtaxa) one can't all browse through. Currently WMC is not that useful and popular – I think implementing roughly what's proposed here would be one of the top effective ways to change that not far from improving Web search engine indexing which could be tied to the usability of WMC category pages.
Explanations with examples:
  • Brief: When looking for good-quality images for rivers from above I don't want to go through all the subcats of this; same for this...there's nothing but relevance (e.g. up-to-date charts) or quality I'm looking for, not any specific river.
  • Longer: Another usefulness-case among many is that I'd like to scroll through interesting cats Category:Microscopic images relating to biology to find interesting/high-quality images rather than going through all of the many subcats without any sorting. And there are cases where images are buried deep down in arbitrary seemingly irrelevant cats – for example when looking for a high-quality picture of a person on a ladder I don't want to go through Category:People on ladders by country or for that of a fitting pic of a bee (any bee) on a flower having to go through each cat in Category:Bees on flowers instead of scrolling (in this case there just are many still unsorted ones but often there are no images at that level).
There are many other strange or at that category relatively irrelevant criteria according to which files are buried in subcats and then missing at the cat above so it'll be hard to create other branches. For example in Category:People exercising (w. equipment) I'd like to create subcategories for the different exercises people are doing so people can use/find these in educational pages about these exercises but people have already begun moving (not copying) them to subcategories distinguishing by 'gender', images may eventually up under cats like "Women exercising‎ in India by city" but then are missing at all categories above despite being the best image for an exercise and gender not being the key or only defining criteria by which to organize these. One further problem among many is that you also can't see recent uploads or up-to-date charts+maps at the top even in categories with many files such as this.
  • Proposal type: feature request
  • Proposed solution: A toggle button that switches the page to a wall of images that looks like the search results. The images are sorted so that:
    • Recently uploaded images are relatively high up
    • The longer you scroll down the less used images are, images that are used e.g. twice on English WP and elsewhere are relatively high up (featured images and so on are also leveraged for this)
    • Year categories like Category:Charts by year of latest data or Category:2023 maps of the world are leveraged to show more up-to-date images higher up
    • Things like whether a Template:Factual accuracy is set could also be used
    • When you search for "People on ladders" you get a lot of images by the same uploader of the same scene at the top which aren't even showing what was searched for – instead use the category for what has been searched for (in this case a 1:1 text match) and sort the files so that it shows many different images at the top
    • Images directly in the cat are mostly high up but when scrolling further images of the next 3 or so levels of subcategories also show up
    • Images are also sorted according to what the user configures using filters (the above is the default and altered accordingly) – currently the filters are only available for the search results but one can't sort images in a category e.g. by recency
  • Phabricator ticket:
  • Further remarks: Not via Deepcat (not possible anyway due to API limits) but a default-enabled view mode even occasional users can easily toggle on. See the search results page for how it could look like once on has clicked on the WoI-mode button. --Prototyperspective (talk) 17:38, 29 December 2023 (UTC)Reply[reply]

Discussion[edit]

Categories on mobile[edit]

Description of the Problem[edit]

  • Problem description: This problem simply is that the mobile website doesn't show files' categories.
    • Lots of contributors spent lots and lots of time categorizing images…they didn't volunteer their time doing so for no reason: categories are (or could be) actually useful in many ways, for example:
      • they often provide additional quick information about the file (such as a year or location not in the description or buried in long text or additional meta-info)
      • they are useful for organization and can be browsed through if the current image (/category) is not exactly what one has looked for but close to the intended/useful category branch (for example: find a similar image via the search engine then click on the category you forgot the name of or didn't know exists)
      • they enable users to find related images (e.g. of a set or of the same subject) as well as finding more images (discoverability)
    • There are many other ways categories can be useful. They're there for a reason and shouldn't only be accessible to desktop users locking out the majority of WMC viewers who use mobile.
  • Proposal type: bugfix / feature request
  • Proposed solution: Just show the categories on the mobile website commons.m.wikimedia.org. They are shown in the apps and on desktop. They should be shown without having to tap on anything and could merely be links at the bottom.

A nice navigable cat tree explorer module could be possible too but more important would be to at least display them somehow. I think many users would use categories as their main way to use WMC, that's how I mostly use it, and it makes no sense to refuse them access to our category pages.

  • Phabricator ticket: (If one exists it's probably stalled for many years despite that all that is needed is briefly showing a few links at the bottom.)
  • Further remarks:

--Prototyperspective (talk) 23:13, 28 December 2023 (UTC)Reply[reply]

Discussion[edit]

  •  Strong support - categories are essential to organize and access the common's contents. --Fl.schmitt (talk) 23:19, 28 December 2023 (UTC)Reply[reply]
  •  Strong support. Should be almost trivial to implement the links. The lack of this constantly frustrates me when I try to use Commons on mobile: I continually have to break out to the desktop interface to access categories. I understand that good display of the category pages themselves on mobile might be a little tricky (but only a little, I would think), but even having links to anything that even vaguely works for categories would be an enormous improvement. - Jmabel ! talk 23:52, 28 December 2023 (UTC)Reply[reply]
  •  Strong support per Fl.schmitt and Jmabel. --Adamant1 (talk) 02:03, 29 December 2023 (UTC)Reply[reply]
  •  Support.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 02:05, 29 December 2023 (UTC)Reply[reply]
  •  Oppose Sorry to say this, but categories are mostly “inside baseball” for Commons regulars. People from outside don't find images that way and I'd be very sad if we spent precious developer time on this. --Frank Schulenburg (talk) 02:47, 29 December 2023 (UTC)Reply[reply]
    @Frank Schulenburg: You're describing exactly a part of the problem - currently, "people from outside" using mobile apps simply aren't able to use categories to access content on Commons if there's no link/information as proposed by @Prototyperspective. With a category link, "people from outside" would be easily able to find more relevant media, e.g. providing different perspectives of an object, thus fulfilling Commons' educational purpose. Fl.schmitt (talk) 07:18, 29 December 2023 (UTC)Reply[reply]
    Frank's point is of the order of "People use images via google, not via categories". Categories are for editors. Editors think that outsiders will use them, because we editors are neurotic data sorters that were mostly born before the year 2000 and think that this is how other people work as well. But it's not. Everything that we do here is vastly deviant from 'normal' behavior. And in the grant scheme of things, the fact that Google doesn't index our images is a factor 1000 more impactful to the educational purpose than anything else most likely.. As such, it's inside baseball. Important, but not really to the fact that it will majorly change Commons. —TheDJ (talkcontribs) 14:14, 29 December 2023 (UTC)Reply[reply]
    I think Web search engine (Google and DuckDuckGo mainly) indexing is one of the few top issues.
    That doesn't make this issue less important, especially considering that one could at least show the links which doesn't require a large effort but mainly because Web search engines could nicely index category pages as well.
    Moreover, outsiders could benefit from categories; a good point may be that they wouldn't necessarily use them by going to them from the bottom of file pages – that's why there could be a more visible intuitive way they could be included there and why they could show up at the top of the search results (that's a separate issue for another time). Lastly, it's just speculation, ignores the value of this to what he calls "Commons regulars", and it's obvious they "don't find images that way" if categories are as hidden as they currently are where most occasional users probably don't even know they exist. Doubling the viewcounts of category pages for example would make WMC much more useful, it may not change its internal design but it makes the efforts of contributors much more valuable and substantially increases the usefulness of the site. Prototyperspective (talk) 14:53, 29 December 2023 (UTC)Reply[reply]

Checkbox to mark new files as current on upload[edit]

Description of the Problem[edit]

  • Problem description:

As some of us know well, there has been some controversy about file overwrites in Commons. Now, files can't be overwritten by any user as before, but files with "Current" template, can. The problem is that many users don't know about this template, and it can be very difficult to use for new users.

  • Proposal type: bugfix / feature request / process request

feature request

  • Proposed solution:

Checkbox to mark a file as "Current" (or not) on upload. Creating a file redirect to have a versioned file could also be facilitated (for example, a "Versioned file" check: a date is added at the end of file's name, and a file redirect with the original name, redirecting to it; a component in file's page could make easy to upload a new version as a separate file, while updating the redirect so it points to the new file now).

  • Phabricator ticket:
  • Further remarks:

Some users complain that "Current" isn't the most intuitive name for such a template. Perhaps a new, better name, should be agreed.

Discussion[edit]

I totally supported restricting file overwrites, but I think that things should be made easier for users on files that need overwrites indeed. This will play in favor of overwrite restriction, since it will have much less opposition. I fear that too many complains could eventually roll back that good change.MGeog2022 (talk) 11:03, 23 December 2023 (UTC)Reply[reply]

@MGeog2022: It doesn't work the way you think it should. In order to be able to overwrite, either the user has to have autopatol+ or the file description page has to have {{Allow Overwriting}} on it. I spent days making sure that files marked with {{Current}} at the time (12 November 2023) got {{Allow Overwriting}} with the permission in Special:Diff/820992375. The whole sordid mess is documented in Commons talk:Overwriting existing files#Updating maps. I am not planning on doing that again, and doing so would require more complications to avoid double additions.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 00:58, 29 December 2023 (UTC)Reply[reply]
@Jeff G., sorry, I thought that files marked as "Current" worked "the old way", and could be updated by any user. I see that "Allow Overwriting" template is to be used only by patrollers, so this proposal makes no sense. I retire it (only the "Versioned file" part would make sense, but it's something a bit complex). I hope that the frustration of some users with overwrites will gradually disappear. Only a question: is there any reason to not allowing overwriting to all registered users for a specific file, even if its uploader explicitly wants it? (I understand that in this case he/she assumes all associated risks) MGeog2022 (talk) 12:37, 29 December 2023 (UTC)Reply[reply]
Well, after reading the response by RZuo, then perhaps this proposal would make sense for files with "Recent" or "Update", but not "Current" (this also answers my previous question). MGeog2022 (talk) 12:40, 29 December 2023 (UTC)Reply[reply]
i copy my comment over here from https://commons.wikimedia.org/w/index.php?title=Commons_talk:Overwriting_existing_files&oldid=836174261#Updating_maps .
those with {{Recent}} and {{Update}} are suitable for overwriting, but those with {{Current}} mostly dont actually need overwriting, because:
  1. {{Recent}} means the file is expected to be constantly updated to reflect the recent real world.
  2. {{Update}} means something really needs to be updated.
  3. {{Current}} means the file reflects the current situation; the current situation may change, so it may be updated; but in reality most files with this template dont ever need to be updated because nothing they depict will change (at least in the foreseeable future).
RZuo (talk) 10:36, 29 December 2023 (UTC)Reply[reply]
@RZuo: You never answered my question "Are the recent and update ones also current?" there. Perhaps I need to be more specific. @GPSLeo: Should we "Allow Overwriting" for files bearing the {{Recent}} and {{Update}} tags (that is, files in Category:Most recent version and Category:Images requiring an update)?   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 15:12, 29 December 2023 (UTC)Reply[reply]
I think yes they should also get the overwrite template. But I think we should consider to merge {{Current}} and {{Recent}} as I do not see a real difference between them. For {{Update}} I think it is the best to use it additionally to the other ones and remove it after the update was done but leaving the other template and also the {{Allow Overwriting}}. GPSLeo (talk) 15:36, 29 December 2023 (UTC)Reply[reply]

Priority thumbnail render queue[edit]

Description of the Problem[edit]

  • Problem description: Pages with many new thumbnails like on Special:ListFiles shortly after uploading, new created galleries or viewing files in the MediaViewer on lager screens leads to the problem that you run into the rate limit for thumbnail generation. This results in missing thumbnails or missing image in the MediaViewer.
  • Proposal type: feature request
  • Proposed solution: Currently the limits are the same for logged in users and not logged in users. There should be a second priority thumbnail rendering queue for autoconfirmed users. If this is not sufficient there could also be a third queue for admins and bots only.
  • Further remarks:

Discussion[edit]

Fix MediaViewer's inability to handle collaborations[edit]

Description of the Problem[edit]

  • Problem description:

When two or more creators created a work of media, MediaViewer will always reduce this to the first person with a Creator template. It functions perfectly well when names are given in text alone, however. For example:

This should list two creators, Humanité René Philastre and Charles-Antoine Cambon. It instead will only list one. There is, however, no single creator who's more "correct" than the other: This was a collaboration. This is, of course, misattribution, which is especially bad since MediaViewer presents itself as if it can provide an accurate credit line. This is not a new issue: The problem has been known about for a decade now, and really, really needs at least some resources thrown at it.


Alternatively, consider just defaulting to "See file description page" with a link there in any case where the conversion isn't trivial.


  • Proposal type: bugfix
  • Proposed solution:

I'd suggest it'd be easiest to move forwards by allowing two or more creator templates to be used: A Creator template and additional text breaks as well, but that, at least, can be worked around by making a Creator template (or Creator-like template) for additional people. Personally, I would suggest at least four Creator templates, because I can point to cases with four names easily: File:Edward_Duncan_-_The_Explosion_of_the_United_States_Steam_Frigate_Missouri.jpg.

  • Phabricator ticket: T68606 - (from 2014!!!)
  • Further remarks:
    • The other thing that doesn't work well in the MV are dates with precision lesser than days (e.g. months, years), that are displayed only as "1 January XXXX". — Draceane talkcontrib. 15:55, 22 December 2023 (UTC)Reply[reply]
      For variable values of "January", or would even 2020-08 show as "1 January 2020"? - Jmabel ! talk 18:50, 23 December 2023 (UTC)Reply[reply]

Discussion[edit]

Fix rsvg text alignment regression[edit]

Description of the Problem[edit]

  • Problem description:

The latest thumbnail-image-maker (named rsvg) unfortunately has a bug which misaligns centre- or right-aligned text tags containing tspan tags on the same line. Many existing files have been affected.

  • Proposal type: bugfix / feature request / process request
  • Proposed solution:

Fix rsvg or use a version without the bug.

  • Phabricator ticket:

http://phabricator.wikimedia.org/T97233

  • Further remarks:

@Glrx: described the root cause as follows:

The problem is computing the width of an SVG "text chunk". If the text chunk consists of multiple XML nodes, then librsvg is using the width of the last node as the width of the entire text chunk. (librsvg is correctly tossing out the initial and final whitespace for the text element.)

Discussion[edit]

  •  Strong support In practical terms: This bug causes <text> elements containing <tspan > sub-elements (for subscripting, italicizing, boldfacing, coloring, font-sizing, etc.) to be rendered in the wrong place in Wikipedia thumbnails and on Commons file description pages—despite rendering properly within browsers during development. Example: text that should be centered on the page, runs off the right margin. Over years, this bug has required me to revise a few dozen .svg images to specify all attributes within each <text> element, or even to compromise content to work around rendering problems. RCraig09 (talk) 06:32, 21 December 2023 (UTC)Reply[reply]
— Examples: see earlier versions of these images: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14. — RCraig09 (talk) 22:34, 26 December 2023 (UTC)Reply[reply]
@RCraig09: Another buggy file for your records. Cheers, cmɢʟee ⋅τaʟκ 06:15, 2 January 2024 (UTC)Reply[reply]
Yes, User:Minoa, I think that is the same problem. Having <tspan font-size="18">APPROVED</tspan> embedded in a <text> specification falls prey to this bug. RCraig09 (talk) 07:12, 22 December 2023 (UTC)Reply[reply]
 Strong support: this has to be fixed, because the problem affected two of my uploads and the made centring of text with different font sizes tedious. --Minoa (talk) 03:18, 24 December 2023 (UTC)Reply[reply]
Thanks, @Minoa: not just tedious but strictly impossible, if the specified font is unavailable and another is substituted. cmɢʟee ⋅τaʟκ 01:36, 25 December 2023 (UTC)Reply[reply]
I personally recommend changing to a fasterer and less buggy renderer, see phab:T40010, in more details can be found at: User:JoKalliauer/SVG_test_suites  — Johannes Kalliauer - Talk | Contributions 14:16, 25 December 2023 (UTC)Reply[reply]
  •  Strong support I have come across SVGs here on Commons where the author converted the text to paths as a kludge workaround for this problem, increasing the file size 10x. - Wikkiwonkk (talk) 16:17, 29 December 2023 (UTC)Reply[reply]
 Support as a graphist from COM:GLI --Designism (talk) 18:46, 30 December 2023 (UTC)Reply[reply]

Video conversion support[edit]

Description of the Problem[edit]

  • Problem description: New users who attempt to upload videos in MP4 format, a very common video format, are met with an error message that provides no guidance on how to convert the video to an acceptable format. The current video conversion tools, such as Video2commons, are frequently nonfunctional. This likely results in many users not uploading files that would be of use to Commons.
  • Proposal type: feature request
  • Proposed solution: The file upload wizard should offer to perform a conversion to an acceptable file type whenever a user attempts to upload an MP4 file.

Discussion[edit]

Courtesy pinging Sannita (WMF), who has discussed this with us. Cheers, {{u|Sdkb}}talk 05:57, 19 December 2023 (UTC)Reply[reply]

UploadWizard to add SDC[edit]

Description of the Problem[edit]

  • Proposal type: feature request: the Upload wizard should include some basic structured data (which are not depicts) or prepopulate SDCs in the last step of the upload for the user to confirm
  • Proposed solution: After an upload with the upload wizard all information after this diff [2] is already included
  • Further remarks:

Discussion[edit]

Supporting as proposer and also as operator of User:SchlurcherBot who does exactly these edits and could otherwise focus on the less well understood SDC cases. --Schlurcher (talk) 08:19, 19 December 2023 (UTC)Reply[reply]

Massive support for video (or not)[edit]

Description of the Problem[edit]

  • Problem description: I believe we (Commons) need directive from the WMF as to whether it is even practical to have support for a large quantity of large video files. This question becomes more pressing as we start to see more and more commercially-made films, many of which have been digitized, come into the public domain. I suspect that whether we can reasonably host (and stream on demand) any large number of such films is mainly a technical and budget consideration, and I'd like feedback from the WMF as to what is feasible, so that we don't either waste our time discussing proposals that would be impossible to implement, or (worse yet) go ahead with adding a lot of content that we cannot adequately support and frustrating users by a half-assed implementation. - Jmabel ! talk 20:05, 18 December 2023 (UTC)Reply[reply]
  • Proposal type: process request
  • Proposed solution: clarity from WMF
  • Phabricator ticket:
  • Further remarks:

Discussion[edit]

In my opinion, historic movies (or other videos, such as documentaries or TV images) that enter public domain, provided that they have enough value, are part of the kind of content that Commons should store. Wikimedia Foundation had a revenue of $154.7 million in 2022, while Internet Archive, in 2019, had an budget of only $36 million. Archive tries to host as many content as possible (probably a mistake, and very possibly a big one). Commons, on the other hand, stores only content that is deemed educational (this includes any historic content, including movies). Commons is not an archive, strictly speaking, but as far as it stores historic material, it can, in fact, be considered an archive. And an archive that is part of something far greater, the sum of all human knowledge, that also has other archive-like components (such as Wikisource). I think that selected videos of high value, are to be stored in Commons, even if they take some space. Specially, if they are somewhat rare, and are likely to be lost. As I mentioned on another request in this same technical needs survey, Internet Archive, a really, really great idea, right now stores only 2 copies of each archived item, and both of them in San Francisco area, of high seismic risk (in my opinion, a really, really bad idea). I doubt it has enough money to make more backups, given its relatively low budget, and that it stored, as of 2021, 99 PetaBytes (1 PB = 1024 TB) of unique data. Commons currently stores "only" 471.86 TB (only 22.64 TB of videos), all of them replicated in 2 datacenters (both in USA: Virginia and Texas, in areas with no particular natural risks), plus a complete backup on each (and, probably, additional copies). It also uses RAID (multi-disk setup on each server), according to Wikitech. So, even if Commons doubles in size, it would be 1% of Internet Archive size, in an organization with a budget 5 times larger, and with much greater guarantees of preservation. Yes, Wikimedia must also store many other projects, but they are much smaller in disk space. And it must provide a connection speed, and handle a load of requests, far larger than Internet Archive. But I think it can can help preserve highly valuable content that might otherwise be lost. MGeog2022 (talk) 20:09, 19 December 2023 (UTC)Reply[reply]

1 PB ≠ 1024 TB. 1 PiB = 1024 TiB. 1 PB = 1000 TB. SI and IEC prefixes are based on different bases (2 for IEC prefixes, 10 for SI prefixes; otherwise 1km would be 1000 meters, and 1 kB would be 1024 bytes, which is very confusing). Petabytes must never be confused with powers of 2. 99 petabytes are 99000 terabytes. Everything else is technically wrong. I want to pay attention here. The units are unambiguously defined --PantheraLeo1359531 😺 (talk) 21:38, 5 January 2024 (UTC) Reply[reply]
Thanks, I was aware of it, but, although technically wrong, the TB notation is usually much more used (and known) than the TiB one (that many people probably don't even understand: even here, TB is used as TiB). I just meant to say that 1 PiB is to 1 TiB, the same than 1 TiB is to 1 GiB. MGeog2022 (talk) 14:08, 6 January 2024 (UTC)Reply[reply]
Only if it lets the videos upload successfully.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 00:15, 20 December 2023 (UTC)Reply[reply]
"...and are likely to be lost". To play devil's advocate for a minute, I wonder how often this is actually the case. Once a public domain video has been digitized, it seems to typically proliferate (as a free source of monetization) rather than disappear. Are there any known examples of a video that has been digitized and subsequently lost? While I do think Commons should definitely host educational videos, I think more effort should be focused on getting modern videos freely licensed (through content partnerships and video creation projects), rather than on archiving all old films (which I think archive.org, YouTube, and other platforms can do better). Plus video streaming is extremely expensive. $154 million may seem like a lot, but if Commons actually became known as a video platform that money would evaporate very quickly. I don't remember where, but I remember seeing a breakdown created by TheDJ that was pretty informative. Regardless, I do think having more guidance from the WMF on this would be very helpful. Nosferattus (talk) 17:37, 21 December 2023 (UTC)Reply[reply]
@Nosferattus, my complete sentence was: "Specially, if they are somewhat rare, and are likely to be lost". Rare videos are more likely to be lost that other more widely known ones.
but if Commons actually became known as a video platform: that's why I said such things as "selected videos of high value", I know that Commons can't allow uploading any video that someone wants to (useless images take up much space, and with videos it's much, much worse).
I think more effort should be focused on getting modern videos freely licensed: again, with some strict criteria, I understand. Otherwise it could be the same problem you talked about, or even worse.
Once a public domain video has been digitized, it seems to typically proliferate (as a free source of monetization) rather than disappear: probably yes, but this doesn't eliminate the need to have a copy stored indefinitely somewhere, to ensure its preservation.
which I think archive.org, YouTube, and other platforms can do better: YouTube is a commercial platform, and archival or preservation are not part of its goals. Uploaders can delete the content they uploaded, and if their account gets closed, their videos can be deleted over time. Archive.org always seems to make the wrong decisions: as I said before, they store all than they can, so they can't have more than 2 copies (and they don't even use RAID disks: when a disk fails, there is only 1 copy for a time, while the disk is replaced; at least once, they lost content due to a defective disk; this doesn't seem like best practices for an archive). As if this was not enough, both copies are placed in San Francisco, where a strong earthquake can cause severe troubles at any moment. They also take legal risks that costs them money from their already small budget for what they are trying to do. Of course archive.org would be the right place for video archival, I hope in the future they get more money or make wiser decisions, but, for now, I think they won't really achieve their goals (archival and preservation of content for an indefinite period), despite their good intentions and the great idea that the project is.
Plus video streaming is extremely expensive: perhaps some big videos could be offered only for downloading and not streaming, for example. MGeog2022 (talk) 19:55, 21 December 2023 (UTC)Reply[reply]
All of which seems to support my statement that we need some guidance from WMF about the parameters within which we are to operate. - Jmabel ! talk 00:08, 22 December 2023 (UTC)Reply[reply]
Yes, I wasn't questioning it at all. I was only providing arguments in favor of supporting large video files in Commons (arguments that can be forwarded to WMF when asking for guidance). MGeog2022 (talk) 13:23, 22 December 2023 (UTC)Reply[reply]

Okay, let me make a comparison. MGeog2022 said Commons is currently hosting 471.86 TB of data. Recently Apple introduced an option for up to 12 TB of iCloud storage for every single of their almost one billion iCloud users. 40 iCloud users only with 12 TB and you have more storage available than Commons is currently using. iCloud 12 TB comes with a lot of other services and only makes 60 Euro a month, so 40 users x 60 Euro = 2400 Euro a month (28.800 a year). Compared to 180 million dollar revenue Wikimedia has.

Digital storage is so cheap it almost never does matter in the balance sheet.

And by the way because of Internet Archive, deleting things is easy restoring lost things not. Killarnee (talk) 08:25, 27 December 2023 (UTC)Reply[reply]

@Killarnee, Apple's budget is far bigger that that of Wikimedia Foundation. iCloud could be more expensive than the cost the users pay for it, by putting additional money into it. Also, I don't know how many copies they store, for example. But, yes, storage is relatively cheap, and probably will be much more cheaper in the future, so I reaffirm that free content of high value should always be accepted into Commons. There are new technologies currently in development, that will be of great help to Wikimedia Foundation, and perhaps in the near future even enable Internet Archive to have more copies of all its content in different locations. I hope it arrives sooner that an earthquake cutting off network connection in their San Francisco datacenters, with some disk damages, and failing old disks doing the rest in the next few days.... of course not all content would be lost, but there would be random losses by sure. MGeog2022 (talk) 12:15, 27 December 2023 (UTC)Reply[reply]

Left some comment on the talk page about this. The support for that is already there. Recently created Category:Videos of films by year. One thing that is missing is enabling multiple audio tracks (the sooner that is possible the better) and the ability to specify the thumbnail image. Agree with MGeog2022 regarding the costs. I think it would be a good idea that for large filesize films only a reasonable size is stored with the higher-res version linked as is done for the 2018 documentary I just uploaded. Another idea is to embed media hosted on Internet Archive or even using decentralized storage like IPFS but that probably isn't necessary. There aren't many films in the public domain and those that are aren't watched; that seems unlikely to change dramatically within the next decade. --Prototyperspective (talk) 22:38, 28 December 2023 (UTC)Reply[reply]

@Prototyperspective: I believe it is likely to change dramatically in the next decade. Right now, as far as commercial films, for the most part unless someone screwed up their copyright registration only silent films are PD in the U.S. By 2030, every pre-Code Hollywood film will be PD. Ten years after that, every Thirties musical, etc. I don't think we can just blithely assume that this (streaming especially, more than storage) won't become a technical issue. Are you saying we don't need to discuss this with the tech side of WMF? - Jmabel ! talk 23:42, 28 December 2023 (UTC)Reply[reply]
I think it's going to change dramatically at some point, but not this close albeit I haven't checked thoroughly exactly which files will be in the public domain by then. See MGeog2022 notes about how much the Internet Archive stores and those still aren't many (YouTube probably gets around that video duration/size per ~day) and those that could be here aren't watched by many. No, not saying that. Maybe there should be a request that the tech side notifies the community early enough once and if this is anticipated to become a problem within the next ~7 years with some info on open problems, possible solutions, and required deliberations. Prototyperspective (talk) 11:58, 29 December 2023 (UTC)Reply[reply]
Expenses:
Salaries and wages 67,857,676
Awards and grants 9,810,844
Internet hosting 2,384,439
In-kind service expenses 473,709
Donation processing expenses 6,386,483
Professional service expenses 12,084,019
Other operating expenses 10,383,125
Travel and conferences 29,214
Depreciation and amortization 2,430,310
Special event expense, net -
Total expenses 111,839,819
Increase in unrestricted net assets 51,046,867

https://wikimediafoundation.org/about/annualreport/2021-annual-report/financials/#section-1

Internet hosting is 2.4 million, thats even of Donation processing expenses only a third. I stay with the fact that if there really are money problems, then you have to start saving somewhere else.

People go to YouTube because it's a social network, there are algorithms that keep you engaged and you can like and write comments. None of this is possible here, which is why the „streamers“ are unlikely to be on Commons to a relevant extent. Anyone who claims otherwise should please provide evidence. For me the numbers are clear.

The real question is how to make commons more attractive. There is competition like YouTube or Pixabay etc. Right now Commons is a site with a bunch of files. This is too technical for most people who are just looking for pictures or videos. But instead of discussing how to make commons more marketable, we are of course only discussing what may not be here. Killarnee (talk) 05:43, 30 December 2023 (UTC)Reply[reply]

100% agree. While this is about potential future problems, it's just speculation and MGeog2022 already made good points regarding how much the Internet Archive is able to host alongside potential ways these problems could be mitigated if they ever become a problem (IPFS/PeerTube to size-limited videos). question is how to make commons more attractive … now Commons is a site with a bunch of files that is basically what my proposal for a 'wall of images view for category pages' is about and once that's possible you could also switch the filter to "videos" to show the videos which could even have a standardized permalink like "…Category:Videos of documentaries from the 21st century?v=wv". Other than that I'm not sure what you mean with "more marketable"; do you have some idea and does it also refer to Web search engine indexing? Prototyperspective (talk) 09:27, 30 December 2023 (UTC)Reply[reply]
I'd like to point out that I'm not suggesting for Commons/Wikimedia Foundation to store nearly as much content as Internet Archive does, because, despite WMF has a far bigger budget, Archive stores many content of little or questionable value, and stores only 2 copies of each file, while Wikimedia Foundation stores 8 copies (2 production copies and 2 backups, all of them in RAID disks with 2 copies each (source: Wikitech), while Archive stores only 2 copies in all; that is, 4 times more copies in WMF, although its budget is also more than 4 times larger). The strength of Commons is in part due to not having a policy of accepting everything in, as Archive has. So I think Commons should (precisely because of the limitations Archive has) host as much valuable content as it can, but without missing the point of being a manageable collection, with proper backups, etc. Of course, limits have to be placed to avoid converting Commons into an "Archive bis", suffering from the same kind of problems. Commons now is 0.5% of Archive. It could possibly be 1, 2 or 3% of it without much problems (and of course without reducing backups and copies, or making difficult to improve them from their current state), and this would allow to host many important videos (we are talking here about really many TB of storage), but (unless there is a revolutionary technological advance) it has to avoid becoming something resembling a 100 PetaByte mess. MGeog2022 (talk) 20:34, 30 December 2023 (UTC)Reply[reply]
wiktionary:marketability
A main page that is not created by hand every day, but shows unique suggestions generated for each user. Instead of MediaWiki markup, storing all values individually in the database. Mediawiki markup and alternatives such as Markdown are intended for software developers to write technical documentation, so you can see that there are a bunch of motivated people here, but there is no trace of an idea about what the end customer actually needs. They usually have no idea about technical documentation and Mediawiki markup and shouldn't have to learn it first.
It's astonishing, Wikimedia is the only major organization in the world without a marketing department.
But these are all changes that cannot be made so easily. Someone has to come here and, especially in the software department, say how it is done and then it is implemented. Completely new from scratch, not always this messing around with tools. At the moment, no one is really taking responsibility, especially because there is a lack of knowledge. I mean honestly, there's a discussion going on here like it happens every day in management and here I'm the first to come up with a data sheet. Killarnee (talk) 04:15, 1 January 2024 (UTC)Reply[reply]
"Wikimedia is the only major organization in the world without a marketing department." Which is part of why I remain active in the site. - Jmabel ! talk 19:26, 1 January 2024 (UTC)Reply[reply]
And which is the part why Wikimedia Commons is not popular except for its connection to Wikipedia. Killarnee (talk) 17:46, 3 January 2024 (UTC)Reply[reply]
It doesn't need marketing but paying attention on and identifying what inhibits it from being more popular, such as user experience, modern UI, intuitiveness, Web search indexing and so on. I proposed the wall of images view to address this which hasn't been picked up yet. What would be the hypothetical way that "marketing" would make WMC more popular and more used rather than say category pages showing up at top of Web search results and being designed to be really useful? There are further things brought up elsewhere that may affect say search engine indexing. "Marketing" is a waste of money basically and very broad. There already are Twitter accounts for Wikimedia Commons and some campaigns like WikiLovesEarth; they didn't change much and anything similar to it but better won't either. Prototyperspective (talk) 17:53, 3 January 2024 (UTC)Reply[reply]
More than marketing, perhaps what is needed is awareness of the importance of free knowledge and freely licensed media. If we are talking about marketing, Commons probably doesn't need it now... is there any other well known repository of free knowledge media competing with Commons? If people don't use Commons enough, perhaps is because they are not aware of the license for the media they are reusing, or have not much interest in sharing or getting knowledge. The work should be on improving this (and I hope Wikimedia Foundation is doing it quite well), and not trying to get "last trending dance in front of a cute cat" videos into Commons, that's not the way. MGeog2022 (talk) 14:50, 4 January 2024 (UTC)Reply[reply]
I think the two things to mention are the Internet Archive and the Creative Commons search which also includes media on WMC. It's less awareness of importance than awareness of how it can be used, how it can be useful, and so on. For example, why are people worldwide basically recording the same documentaries 30 times each only for one specific language rather than making it available in the public domain so it can be translated and raise awareness on critical international issues such as deforestation? I mean wouldn't you want your labor to be efficient? And raise global education on major issues? In that case, it's not marketing but something comparable to policy activism or lobbying and so on. (The same thing would also be trying to get large media websites like artstation and reddit to make it possible for original uploaders to easily license their works under CCBY.) In addition: making the search engine more useful and making people aware WMC is there (for example via using the globally highly popular website Wikipedia) and in a way that people find useful / are interested in could be good. Lastly, it would be nice if they did something to get more illustrators onboard and enable connecting gaps (examples) (similar to requested images) to those who could close them. Prototyperspective (talk) 16:00, 4 January 2024 (UTC)Reply[reply]
@Prototyperspective, yes, I totally agree, we need awareness of how much useful free content hosted in Commons can be, and how it can be used to easily create derived works with much less effort. It's about getting people to create or get as much useful free content as possible, and sharing (and preserving, so) it at Commons, but not marketing (true marketing would only make sense is if there was another succesful website doing exactly the same as Commons). These are things that concern WMF, and I hope they will go in this direction, if they aren't doing it enough. As for search engines, I've noticed that Google Images doesn't index many Commons unused images, I don't know if something can be done (sorry, we're getting a bit off-topic from Massive support for video (or not), since this "marketing" thing came up). MGeog2022 (talk) 20:07, 4 January 2024 (UTC)Reply[reply]
that can easily change.
i had an idea of creating a website like fmovies, but with all pd movies hosted on commons. then people will be streaming without even paying attention to the fact that they're streaming from commons. :) RZuo (talk) 00:05, 1 January 2024 (UTC)Reply[reply]
So we shouldn't let other websites embed files from Commons. So easy. Killarnee (talk) 04:19, 1 January 2024 (UTC)Reply[reply]
@Killarnee, please remember that Commons isn't a commercial product. It would be great if, for example, YouTube offered all Commons videos for streaming, so Google would donate lots of much needed money to Wikimedia Foundation, while, on the contrary, if it sees Commons as a competitor, it could cease to donate any money to Wikimedia. Commons must seek to meet its goals, not try to compete with commercial platforms. MGeog2022 (talk) 17:59, 1 January 2024 (UTC)Reply[reply]
By the way, thanks for the WMF expenses table. It really strikes me that about 5% of expenses are "Donation processing expenses", but they are probably bank fees that are unavoidable. MGeog2022 (talk) 18:07, 1 January 2024 (UTC)Reply[reply]
See User:Spinster/WikiFlix. Yann (talk) 17:10, 4 January 2024 (UTC)Reply[reply]
Yeah, we already have a good collection of old public domain feature films here. And, I'm no expert, but perhaps part of them (for example, some of the Soviet or Indian ones) could be rare movies that is specially important to preserve. MGeog2022 (talk) 20:17, 4 January 2024 (UTC)Reply[reply]

Media dumps[edit]

Description of the Problem[edit]

  • Problem description:

There are no Wikimedia Commons dumps that include any media. There's an open Phabricator ticket since 2021 (T298394), but no major advances have been seen. The root of this problem seems to be fundamentally in the enormous size that the sum of all media currently in Commons has (almost 500 TB). Fortunately, thanks to the hard work of some guys, Commons media now have 2 backups at very distant locations (https://phabricator.wikimedia.org/T262668, https://wikitech.wikimedia.org/wiki/Media_storage/Backups), although in the same data centers as the primary copies. Having copies in more locations would provide greater security, considering the value of some of the content hosted.

  • Proposal type: bugfix / feature request / process request

process request

  • Proposed solution:

There's no need at all to include ALL Commons media in dumps. Focus should be in images with special value, such as historical photographies or documents (here, historical does not necessarily mean old) or featured pictures. Using categories, it should be easy to select all pictures depicting paintings, books, documents, maps (with some kind of filter to exclude user-made or trivial maps, such as country location maps, that, individually, take very little space, but there are lots and lots of them), or photos of special historic value (again, they can be very recent, provided they depict something trully historic). Featured pictures are easy to select since they belong to specific categories. This collection (a subset of Commons) could be split by topic, to have even smaller individual dumps. These dumps, could then be distributed to mirrors around the world (for example, in libraries or universities that volunteer to host them, using a model similar to Debian mirrors). Internet Archive would be another location to host them, but, since it stores only 2 copies of each item, both of them in San Francisco area, with high seismic risk, it probably isn't, sadly, to be relied on for long-term preservation, unless they improve this in the future, or paid Archive-It service (https://support.archive-it.org/hc/en-us/articles/208117536-Archive-It-Storage-and-Preservation-Policy) is used (they store more copies in other locations when using this option).

  • Phabricator ticket:

T298394

  • Further remarks:

This proposed solution are only general ideas that obviously need much more revision and elaboration, but the basic goal is to have at least dumps with the media that is deemed most important (criteria and technical aspects apart). Having backups of all media in other locations besides the 2 main datacenters would be another, perhaps even better, solution. It costs money, but it should be a priority in the budget, as Wikimedia Foundation Mission states: The Foundation will make and keep useful information from its projects available on the internet free of charge, in perpetuity.

Discussion[edit]

  • Tending toward  Support but some things are a bit unclear. Are you saying there are 3 backups of WMC at a distant location but all at the same place so there should be a fourth at another place? (How many and do you request one backup to be moved or another full backup?)
I'm thinking about whether there are methods of excluding files to reduce the size but that could also introduce problems due to lower data loss severity. Maybe there are some categories of files where the file-sizes are very large despite being of little use where all or all unused files could be excluded from the dump. Or there could be very small backups of all files that are in use or otherwise likely valuable. I think approaches that exclude files (such as all videos longer than 10 minutes or larger than 200 MB plus all uncategorized unused images etc) rather than use a whitelist approach would be best.
I think small Wikimedia Commons datadumps of files as well as metadata (like file descriptions and cats) would be useful and so far haven't found it.
More full backups should certainly be done once there are new technologies of sustainable long-term large-scale data storage. For now, when non-public full backups are concerned 3 backups does seem possibly enough. --Prototyperspective (talk) 11:47, 29 December 2023 (UTC)Reply[reply]
@Prototyperspective, I said that there are currently 2 backups, but there are at the same datacenters that the primary copies (there are 2 primary copies at Virginia and Texas datacenters, so the 2 datacenters are distant from each other, but there is no backup outside those 2 places, and I think it is advisable to have at least a third place, especially if there are no media dumps that add more copies).
I think approaches that exclude files (such as all videos longer than 10 minutes or larger than 200 MB plus all uncategorized unused images etc) rather than use a whitelist approach would be best.: I totally agree: surely it would be a lot easier and produce a better result, thanks (I would add trivial maps (for example, <1MB, or 500 KB) to the exclusion list, though, because there are lots of them and are of very low value themselves, but perhaps I'm obsessed with it and they don't take so much space).
I think small Wikimedia Commons datadumps of files as well as metadata (like file descriptions and cats) would be useful and so far haven't found it.: since 2013, there aren't (they only include the metadata). The Phabricator ticket linked above contains more info about it. Size seems to be the main problem, so excluding certain files and creating several dumps instead of one, would be of great help, I think. MGeog2022 (talk) 12:57, 29 December 2023 (UTC)Reply[reply]
copies=backups (copies of the data) Please be clearer, this is very ambiguous. I think you're saying there is the database and two backups one of which is at the same place as the live database and that a third at another location would be good.
Missed saying that it could also be the case that excluding large files or categories containing large low-usefulness files wouldn't make much of a difference: it could be that the size large comes from the number of small–medium sized files dispersed all across WMC. For example if it reduced the size by 50TB it wouldn't make much of a difference at 500TB. A treemap of filesize by categories&filetypes or sth similar could be very useful (an issue with that is that files are in multiple categories). I do think that excluding unused+unlikely-to-be-very-useful videos would make a substantial difference in filesize.
A third backup at a third locations is something I support. Prototyperspective (talk) 13:38, 29 December 2023 (UTC)Reply[reply]
@Prototyperspective, copies and backups are not the same (not all copies can be called backups). To be clearer, there are 4 copies: a production copy and a backup at each datacenter (that is, 2 production copies, and 2 backups). An additional backup at a third place would be fine, specially if there are no dumps.
Missed saying that it could also be the case that excluding large files or categories containing large low-usefulness files wouldn't make much of a difference: certainly, additional information would be needed. That is why I initially proposed a whitelist instead of a blacklist: include only those files that are deemed to be specially important, to have working dumps as soon as possible (including more, even all, files, would produce better dumps, but if it is at the expense of keeping a Phabricator ticket open for 10 years (the existing one has been there for 2 years, and no major progress is seen), I choose the most practical solution). More should be known about budget and technical issues before opting for one or the other (whitelist or blacklist). MGeog2022 (talk) 19:07, 29 December 2023 (UTC)Reply[reply]
Another thing: enormous size that the sum of all media currently in Commons has (almost 500 TB) is very misleading: 500TB is very little if that is the actual size. Do you have a source or chart for that number?
That doesn't mean it could change substantially with more HD videos getting uploaded (which could become a problem but isn't one now). So it seems like there are enough backups but when considering the small size, setting up an additional one at a third location in the near future indeed seems like a good thing to do. I don't think it's one of the most important issues at this point though. However, there should be a way for people to download rather than scrape all of WMC (or any select parts of it such as all of its images or all but videos), that seems more important but it's not clear if this proposal is also about that. I don't know if there is a text-only Wikipedia dump and a small WMC dump of all files used by it that you can combine if you have any versions of both (modular). --Prototyperspective (talk) 13:25, 31 December 2023 (UTC)Reply[reply]
@Prototyperspective, this is the source for the size of all media currently in Commons.
500TB is very little if that is the actual size: it depends on what you call "big" or "little". The fact is that there are no media dumps since 10 years ago, and size seems to be the main cause (see here: generating and distributing 400TB of data among the many consumers that will likely be interested on those will still require some serious architectural design (e.g. compared to serving 356 KB pages, or 2.5T wikidata exports)). It's not the same storing 500 TB, than distributing them as dumps (in fact I doubt it can currently be possible if using a single dump).
I don't know if there is a text-only Wikipedia dump: yes, there are, for all languages, and not only for Wikipedia, but also for all other Wikimedia projects (see here).
and a small WMC dump of all files used by it: no, there isn't any media dump. Again, "small" is relative, taking into account that English Wikipedia dump (as compressed text, and I think that it includes full version history for all currently existing articles) is only 21.2 GB in size (obviously, when there were Commons media dumps 10 years ago, they were far bigger than this). I am sure we are talking about a challenge for the technical team, since a ticket for this has been open for 2 years now. With this proposal, I'm trying to make it possible, trying to greatly reduce the dump size.
setting up an additional one at a third location in the near future indeed seems like a good thing to do. I don't think it's one of the most important issues at this point though: things are taken for granted, until they aren't. I'm not saying that Commons backup policy is wrong (is far better than Internet Archive's, for example, of course), and I think a catastrophic loss of Commons content is highly unlikely. But all other Wikimedia content (text) is distributed as dumps in mirrors outside Wikimedia Foundation datacenters, while media isn't. So in fact there are more backups for all other content (including past vandalized versions of Wikipedia articles, for example), than for any media, no matter how important it is. I think Wikimedia Commons is something really unique, especially if you think about its relationship with other projects such as Wikisource and Wikipedia. All this together, is a really unique collection, and a freely distributable one (the sum of all human knowledge, as Wikipedia slogan says, or, at least, the currently freely licensed part of it). And I believe things should me made easy to distribute copies of it around the world, since probably libraries or universities would be interested in hosting them. MGeog2022 (talk) 14:06, 31 December 2023 (UTC)Reply[reply]

File verification[edit]

Description of the Problem[edit]

  • Problem description:

Source websites from which content is uploaded to Commons may cease to exist over time. Once it happens, files that originate from them could easily (specially when certain conditions are met) be mistakenly taken by copyright violations. Also, even when the source website still exists and has the uploaded file available, there can be mistakes that that lead to a file being deleted by mistake (just have a look here). Another problem is vandalism: if the file page was vandalized, file's source could be missing or have been changed (yes, file history should be reviewed before deletion, but work overload could lead to it not being reviewed with due care).

  • Proposal type: bugfix / feature request / process request

feature request

  • Proposed solution:

Implement a mechanism to verify uploaded files. As a file uploaded to Commons is patrolled (by a user who has privileges for it) it could also be publicly marked as verified (it could also be done for already existing files over time). This proposal is something similar to what is already being done for images from sites such as Flickr, but now for all files from external sources. A verified file would be more than a simple verification or attribution template (for example, verification couldn't be removed by a vandal, only by an administrator if needed). Of course, we can never be 100% sure, but having a file verified, it would require an exhaustive investigation before considering it a copyright violation, so the risk of mistaken removal is greatly reduced. Also, users could trust verified files with greater confidence before using them.

  • Phabricator ticket:
  • Further remarks:

If not feasible, an intermediate solution could be not allowing attribution template removal to unpriviliged users (but this would only be a solution for files to which an attribution template applies).

Discussion[edit]

Does this amount to placing a request for license review on every upload that comes from a third-party site? That seems excessive. Consider especially material old enough to be out of copyright on that basis, or an PD-ineligible logo. Similarly, a U.S. government doc with internal markings that show it to be that; I'm sure there are many other cases. You'd be taking "patroller" (presumably actually image-reviewer) time to verify something that has nothing to do with the source site. - Jmabel ! talk 19:33, 17 December 2023 (UTC)Reply[reply]
If the patroller/image reviewer has indeed verified that the image (or other media) has been published under a free license, I think it would be a very good thing that he/she could mark the file as verified, and this could be visible to anyone. This would even save work for the future: the file is not a copyright violation, so if somebody tags it as such, the deletion request can be quickly dimissed unless some breaking new evidence has been found (this would happen very rarely, if things are well done). Many files are in fact verified (any reviewed media from third-parties that is not found to be a Copyvio, has been verified, but we can't be aware of what files have been reviewed). As an uploader or many files from Spain's National Geographic Institute, most of these files include a text "© Instituto Geográfico Nacional. All rights reserved. Total or partial reproduction banned", because they were published before IGN released them under CC-BY 4.0 license. I'm sure those maps (or at least, most of them) were reviewed and everything was found to be OK. But if in the future, the URL from which they were downloaded ceases to exist, someone could tag the file for deletion as Copyvio. The administrator who reviews the deletion request, would then see that there's an "All rights reserved" text on the image, that it's only a few years old, and that no evidence of it being CC-BY licensed can be found on the source website, because it doesn't exist anymore. I think that allowing to mark a file as "Verified" would solve this. On the other hand, as I also said, not allowing unpriviliged users to remove attribution templates from files, would be another way to prevent that kind of things from happening. MGeog2022 (talk) 19:53, 17 December 2023 (UTC)Reply[reply]
You have given a problem description, but it this an actual problem ? Sure there are lots of things that can happen and happen in small amounts of cases, but is it worth it to complicate everything else for such a case ? The flick case is being done because it is so easy to change licenses on material (in bulk). It is not because Flickr can disappear. Additionally we have our upload date/times and page history to deal with any age questions. And as far as I know, we have never had legal problems because of any of this. I think this is a LOT of overhead we are adding, for very little return. —TheDJ (talkcontribs) 12:14, 18 December 2023 (UTC)Reply[reply]
What about having a list of safe sources, where only administrators can add sites, after verifying them? Or, as I said, disallowing unpriviliged users to remove attribution templates from files uplodaded by other users? Certainly I don't know about this ever happening, but I think it's sad to risk losing valuable material due to potential confusions. I think it's specially risky when the media includes a copyright tag from a relatively recent date, with "All rights reserved" text, such as the case I mentioned. I think my proposal is no complication for patrollers: if I understood well this page, most uploaded files are patrolled in search of possible copyright violations. If the file is found to really be under a free license, it would only be a click or 2 away to have it verified by the patroller (much less work than requesting its deletion, if it was a Copyvio). Older files could be verified on demand. On the other hand, perhaps I'm being a little paranoid here, and all that can be found about the source (even in Wayback Machine, if the site exists no more), file history, etc., is always carefully checked before file deletion. But even if this is the case, my proposal would greatly reduce research work for administrators, if we have files verified in advance. MGeog2022 (talk) 13:35, 18 December 2023 (UTC)Reply[reply]
The patrol user right is a specific user right that enables a user to mark edits, file uploads and page creations as patrolled
What I propose is only a publicly visible way to "mark as patrolled", but only at the file level (it could even be fully automatic, when a patroller marks a file that is not an own work, as patrolled). Once a file was marked this way, it should never be considered a Copyvio, unless very clear evidence is found that it was wrongly verified. MGeog2022 (talk) 14:55, 18 December 2023 (UTC)Reply[reply]
It might be more useful to codify what the "certain conditions" are, in guidelines or policy. I tag a lot of files as copyright violations, and a very common scenario is that an image is clearly an old stock image since it's being used on dozens of websites, that usages predates whatever date the uploader claimed it was (i.e. the uploader says it was their work from 12/18/2023, but it's showing up on the web as early as 2013), but the stock site no longer exists so it may well have been under a free license. I would suspect that sites disappearing causes more false negatives than false positives (another common scenario is that an uploader has several files, a few of which appear online before the date they claimed and a few of which did not, and while the other images are probably copyvios there's no proof). But I don't have any proof of that, and by nature it's probably impossible to. At any rate I think the most likely result of this would be creating another backlog another hundreds of thousands of files. Gnomingstuff (talk) 23:11, 18 December 2023 (UTC)Reply[reply]
@Gnomingstuff, the "certain conditions" which I was referring to, were, as I mentioned later, for example, CC-BY licensed maps that include a "© Instituto Geográfico Nacional. All rights reserved. Total or partial reproduction banned" text, because their initial publication date predated when they were released under a free license. I hope that before deleting a file, its history is carefully checked, that it's cheked if there are other files from the same source in Commons, etc. But anyway, the risk still exists, and administrators can have to do much research work in case of such a deletion nomination.
Also, talking about stock image sites, as you mentioned, it has happened that people have uplodaded public domain photos from Commons to stock photo sites, without mentioning they were public domain (this is absolutely legal: public domain imposes no obligation), and they were later deleted from Commons as Copyvio, because they were present at a stock photo site (I read about this sime time ago, sorry but I can't find the source now). If they were verified as public domain in the moment when they were included in Commons, this wouldn't have happened (its presence on a stock photo site should raise the alarm, but then who was right should be carefully investigated, not automatically admitting the uploader to the stock photo site was right, without even consulting the site's owners). MGeog2022 (talk) 13:08, 19 December 2023 (UTC)Reply[reply]
creating another backlog another hundreds of thousands of files: that is the last of my intentions: having many unverified files is no problem, since all files are unverified now. The idea is to have as many verified files as possible (it would be easier for new files: they could be patrolled and verified at the same time), priorizing those who may potentially have more problems, and those that are deemed most important, or whose uploader (or other user) requests them to be verified. MGeog2022 (talk) 13:13, 19 December 2023 (UTC)Reply[reply]
To clarify my example, I'm not saying that IGN will cease to exist tomorrow: I know this won't happen. But please think about the following scenarios:
• It's deemed that https://centrodedescargas.cnig.es/ URL is too long: it's changed to www.cnig.es, so the original URL exists no more.
• IGN decides that there's no need for a separate institution (CNIG) for cartography distribution. CNIG integrates into IGN, so https://centrodedescargas.cnig.es/ ceases to exist as well.
• (I hope this never happens) The government considers the production of new maps too expensive, so it charges a tax on commercial use of new cartography, and new maps aren't CC-BY licensed. Only a obscure notice at the website says: "Maps published before 20XX are CC-BY licensed", while "All rights reserved" text is clearly visible.
• EU countries join their national mapping agencies into a unified European one. IGN ceases to exist as such.
In any of these cases, if an administrator who is not familiar with IGN sees a map that includes "© 2011 IGN. All rights reserved", could possibly delete the file (it perhaps could happen even now, though I hope due care is always taken). IGN maps are in Commons thanks to talks between Wikimedia Spain and IGN (see here; in Spanish), and thousands of maps have been uploaded since by many users. I think we should avoid to risk losing any of them.
Apart from file verification for third-party works, I think that a notice such as "This image wasn't found in Google Images as of 22 December 2023" would be a good thing for user-created works, to avoid such things as photos being "stolen" by uploaders to stock photo sites from happening (with a user's own work, we never can be 100% sure, but this would indicate that a more detailed and calm investigation is needed before deletion). MGeog2022 (talk) 13:57, 22 December 2023 (UTC)Reply[reply]
Another case of where we should finally start using software. Files should also be scanned with a bot that does a tineye and/or Google image reverse search to check if it's likely a copyvio, especially for new uploaders. The bots should populate some categories which people then via some tool quickly review. Here you can see an example (new study): Category:Wikipedia citations improvement AI-based system SIDE. In this case the tool should scan the source link if it a) is still online (if not add an archived version link) and b) whether it supports the file claim of license & source (if not flag as needing semi-manual review). --Prototyperspective (talk) 22:47, 28 December 2023 (UTC)Reply[reply]
@Prototyperspective: Google and Tineye would probably balk at this unless we got permission first.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 01:06, 29 December 2023 (UTC)Reply[reply]
Good point. Maybe that permission could be acquired though and it seems chances are high that they're okay with it. I don't think semi-automatic scans aren't allowed at these sites so one could also have a semi-automated way like a button "Scan this whole category+subcategories via image reverse search" with each file getting a new datetime field for last tineye/GIRsearch so it's not scanned twice (for now). That scan could run for a long time but finishes eventually. Prototyperspective (talk) 11:28, 29 December 2023 (UTC)Reply[reply]
I might understood the issue wrong, but isn't the wayback machine the choice to have? It takes a snapshot of website and can be used as citation --PantheraLeo1359531 😺 (talk) 09:40, 5 January 2024 (UTC)Reply[reply]
A bot would need to make sure there's archived versions of all links in file source / descriptions. Or are you going through 100 million files by hand to manually create snapshots where there are none? It's rare (that's why I wouldn't consider it top priority now) to see files with dysfunctional unarchived links but it happens and could happen more often if some larger sites go down within the next few decades. Prototyperspective (talk) 10:17, 5 January 2024 (UTC)Reply[reply]
Not all sites are necessarily archived on Wayback Machine. In any case, my proposal goes far beyond source sites disappearing (perhaps I put too much focus on that part). I'm also talking about possible human errors when a file seems to be a copyright violation but there is an obscure proof that it isn't, or wasted work when a file from a safe source is nominated for deletion (I say this not knowing if a safe list exists, I'm not aware of it and I couldn't find it). From another point of view, I do know that I might have to "defend" (that is, providing more evidence about source and license) my uploaded files over the next days or weeks, if someone doubts they are rightly licensed.... but (at least for me) it makes no sense having to "defend" your uploads years after you uploaded them (and uploaded files can be nominated for deletion at any moment). MGeog2022 (talk) 13:13, 5 January 2024 (UTC)Reply[reply]

Bots[edit]

Description of the Problem[edit]

  • Problem description:

Some bots don't do what they used to do.

  • Proposal type: bugfix / feature request / process request

feature request

  • Proposed solution:

Provide more support to the bot maintainers, add bot maintainers, or bring the bots into WMF management

  • Phabricator ticket:
  1. None yet.
  2. T339145
  • Further remarks:

List of such bots and undone tasks:

  1. User:SteinsplitterBot: Maintenance of reports like Commons:Database reports/Abuse filter effectiveness, which has not been updated since 00:43, 06 October 2020 (UTC). Updates requested of User:Steinsplitter 11:42, 25 September 2022 (UTC) in an ignored post archived to User talk:Steinsplitter/Archive/2022#Commons:Database reports/Abuse filter effectiveness.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 01:05, 10 December 2023 (UTC)Reply[reply]
    Also, per User:SteinsplitterBot/Rotatebot, Steinsplitter's SteinsplitterBot has not performed the functions of User:Rotatebot since 01:15, 28 December 2023 (UTC) (11 days, 17 hours, 37 minutes ago).   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 18:52, 8 January 2024 (UTC)Reply[reply]
  2. Commons deletion notification bot, which notifies talk pages on other WMF wikis about images that are up for deletion on Commons, has been broken since 2023-06-06. See T339145. Toohool (talk) 19:13, 10 December 2023 (UTC)Reply[reply]
    @Toohool: MusikAnimal (WMF) started that task. It has needed discussion since Jun 21 2023, 10:07 AM. This is what can happen under WMF management.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 19:23, 10 December 2023 (UTC)Reply[reply]

Drafted by   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 01:05, 10 December 2023 (UTC)Reply[reply]

"Some bots don't do what they used to do." This is a very bad problem description. It casts a very wide net, that has no defined boundaries to the problem to be solved. It is much better to have specific items for specific bots. —TheDJ (talkcontribs) 12:16, 18 December 2023 (UTC)Reply[reply]
@TheDJ: They are not maintaining the site in the manner to which we have become accustomed. I am sure that more will be listed.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 13:46, 18 December 2023 (UTC)Reply[reply]
in other words, bots that perform tasks essential for maintenance of commons but are only maintained by users. if they cannot work and users cannot fix them in time, commons is paralysed. such things have happened quite often.
these bots are all quite important:
  1. User:ArchiverBot User:SpBot archive discussions on demand (placement of their templates).
  2. a few other bots archive specific discussions.
  3. User:SteinsplitterBot/Rotatebot rotates files.
  4. User:QICbot COM:QI
  5. User:CommonsDelinker
RZuo (talk) 01:01, 1 January 2024 (UTC)Reply[reply]

Discussion[edit]

  • I want to run a bot but there seem to be too many things to clear. I would appreciate if somebody creats a more friendly tutorial for beginners. --トトト (talk) 13:06, 18 December 2023 (UTC)Reply[reply]

File upload stability[edit]

Description of the Problem[edit]

  • Problem description: When uploading files using the UploadWizard or the API users experience very frequent problems resulting in aborted uploads or broken files. When the error is not recognized broken files or file description page info might be lost for Commons. If they are recognized they are very inconvenient to the uploads resulting in long term term contributors leaving or scaring new contributors.
  • Proposal type: bugfix
  • Proposed solution: Define the goal that only 1:10000 uploads using the API should fail because of server side problems. Only 1:1000 uploads should fail when uploading in the web browser because of server or website errors.
  • Further remarks: Feel free to add other relevant tickets. GPSLeo (talk) 14:05, 9 December 2023 (UTC)Reply[reply]

Discussion[edit]

  • Diesem Vorschlag schließe ich mich aus tiefstem Herzen an. Insbesondere der UploadWizard könnte die Server-Fehlermeldungen viel verständlicher darstellen und viele auch besser abfangen. Ich möchte auch nochmals auf das Android-Tool Offroader hinweisen, das zeigt, wie stabil Uploads auf Commons mit der vorhandenen Server-Implementierung selbst unter widrigsten Bedingungen sein können, dass ein abgebrochener Upload ohne weiteres - auch auf einem anderen Gerät und mit einem anderen Internetzugang fortgesetzt werden kann, dass Uploads auf Fehlerfreiheit verifiziert werden können, dass Duplikate bereits vor Beginn eines Uploads erkannt und verhindert werden können und das - als Hilfe fürs Entwickeln, die Server-Meldungen während eines Uploads mitschneiden kann für ein PostMortem. --C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 18:41, 9 December 2023 (UTC)Reply[reply]
I agree with this suggestion from the bottom of my heart. The UploadWizard in particular could display the server error messages much more comprehensibly and intercept many of them better. I would also like to point out again the Android tool Offroader, which shows how stable uploads to Commons can be with the existing server implementation, even under the most adverse conditions, that a canceled upload can easily happen - even on a different device and with a different Internet access can be continued, that uploads can be verified to be free of errors, that duplicates can be detected and prevented before an upload begins and that - as an aid to development, the server messages can be recorded during an upload for a postmortem.
translator: Google Translate via   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 00:27, 10 December 2023 (UTC)Reply[reply]
Yes, this is sorely needed! I like the idea of having target metrics especially. Nosferattus (talk) 17:40, 21 December 2023 (UTC)Reply[reply]

Taking on certain upload tools[edit]

Description of the Problem[edit]

  • Problem description: Certain tools, many of which are not part of the mediawiki itself, are nonetheless very basic for people who upload files to Commons. Many of these are currently each maintained by a single individual. We need a plan for more robust maintenance of these over time.
  • Proposal type: process request
  • Proposed solution: a program manager at WMF should be responsible for a plan for maintenance (or replacement) of these tools going forward. I (Jmabel) am not trying to dictate a particular technical solution here, just to have some entity that is not "the community" take primary responsibility. If this is best done by a paid team at WMF, great. If this is best done by a better-organized and "deeper" pool of volunteers, great. And some might best be left to exactly whoever is doing them now, but if that is a single individual we need at least a plan as to what should happen if that individual becomes unavailable. If it's some mix of the above, or even third parties like the Flickr Foundation, great. And if individuals want to contribute on their own, and the community can adopt their tools or not, that's also great. But I think we need program management from within WMF so that someone has the job of making overall status visible and making sure the ball doesn't get dropped.

Initially, we need to identify what tools would have this status. People are welcome to add to this initial list (and/or clarify situations), but please stick to existing (or previously existing and now broken) tools used by contributors who upload content.

  1. Special:UploadWizard: as I understand it, this is part of mediawiki, and is already maintained by WMF staff
  2. Special:Upload: as I understand it, this is part of mediawiki, and is already maintained by WMF staff
  3. Uploading apps for mobile devices (I know nothing here, I never use them, can someone please fill this in?)
  4. Flickr2Commons: the Flickr Foundation has already taken on the task of replacing this with a more robust tool, which I think means this is well covered
  5. Batch uploader(s) (programs running on a PC): there have been several of these over the years, notably Commonist, which I believe is dead. I have no idea of the current status here
    1. Pattypan: for batch upload via spreadsheets, some issues but working, developed by Yarl and maintained by Abbe98
    2. Vicuna Uploader
  6. tool(s) for mass uploads from GLAMs or other databases of file content: I have no idea of the status of these
  7. Video2Commons: especially important because of its ability to convert file formats. This is often broken in one or another degree. See phab:T353659
  8. CropTool: (rotating and cropping, either for overwrite or for a new file). Currently in danger of breaking because the Grid Engine is about to go away and no one has dealt with this.
  9. Url2Commons: for direct upload from the given URL: written by Magnus Manske but not actively maintained (many unresolved issues)
  10. Commons:derivativeFX, tool at https://iw.toolforge.org/derivative: to easily upload derivative works
  11. IA-upload: used to upload PD works on the Internet Archive to Commons as DJVU files. Some commons issues like: phab:T300761.
  12. The API itself using pywikibot or custom scripts
  • Phabricator ticket:
  • Further remarks: I'm very open to "sympathetic edits" to the above proposal, but reserve the right to revert edits that I think hijack my proposal to be something else. - Jmabel ! talk 22:31, 6 December 2023 (UTC)Reply[reply]
    • I have added some tools. — Draceane talkcontrib. 09:31, 7 December 2023 (UTC)Reply[reply]
    • I added one too.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 00:39, 10 December 2023 (UTC)Reply[reply]
    • In thinking about uploads, it is worth considering various (overlapping, variously combined) groups of users. Some of the considerations include:
      1. Experienced or not
      2. PC vs. tablet vs. phone
      3. Uploading own photos vs. GLAM content vs. other third party
      4. Uploading photos where many photos share a description etc., vs. each being unique
Jmabel ! talk 19:13, 17 December 2023 (UTC)Reply[reply]
@Jmabel: With the ideal tool, everything entered by the user should be sharable in the upload session: all or part of the description, source, author, templates, cats, freeform stuff after the description, freeform stuff before the cats... This could follow the model of the granularity of global preferences vs. local preferences.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 22:25, 17 December 2023 (UTC)Reply[reply]

Discussion[edit]

@Jmabel: (or anyone else). Is there some reason why these things are done through third party solutions instead of just being integrated into the website to begin with? Like is there a reason it's better to have the WMF maintain the CropTool instead of them just making cropping an actual feature of mediawiki? --Adamant1 (talk) 11:15, 7 December 2023 (UTC)Reply[reply]

If this tools will also be available via the API then there is no reason to not make them a feature of mediawiki. But batch uploads via a GUI only tool is no fun. C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 16:33, 7 December 2023 (UTC)Reply[reply]
As I say, I'm not prejudging the technical solution here. Obviously, if something can be brought into mediawiki and provide essentially the existing capability, that's great, and also benefits other sites using mediawiki. What I am saying is that for Commons, all of the above constitute part of the core functionality that we provide to uploaders, and that this deserves the same level of program management and, ultimately, robustness as the content editing that is core functionality across the sister projects. - 18:46, 7 December 2023 (UTC)
Thanks for the clarification. I'm certainly not against the proposal. I was just wondering about the trade offs between having them manage the applications in house versus just building similar features into mediawiki. I guess they aren't mutually exclusive though. --Adamant1 (talk) 13:27, 9 December 2023 (UTC)Reply[reply]
I think that is pretty unavoidable. Different problems require different dedicated tooling to emphasise different aspects that help with the problem. —TheDJ (talkcontribs) 12:49, 18 December 2023 (UTC)Reply[reply]
@Adamant1 Working tools the WMF deems useful for all MediaWiki installations are in Core. Working tools the WMF deems useful for some MediaWiki installations are in Extensions. Working tools the WMF deems useful for all WMF MediaWiki installations are in WMF Builds. Working tools developed by others who saw a need and filled it could be upgraded to any of the above. As far as I know.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 00:49, 10 December 2023 (UTC)Reply[reply]
  •  Comment If there is an interest on going deeper into this topic we could make a voting where the users order the tools on how much they need them. GPSLeo (talk) 07:50, 22 December 2023 (UTC)Reply[reply]
    @Jmabel Do you agree to make a vote on the most needed and used upload tools out of this list? GPSLeo (talk) 09:59, 5 January 2024 (UTC)Reply[reply]
    As I mentioned at another thread, there are three distinct types of users who do uploads: About 2 million users who uploaded less then 20 files each since Commons was started (and one million of these each uploaded only one file ever). 315 users who uploaded 58% of all files that have been uploaded to Commons. And the third group, that uploaded between 20 and 40000 files. All three groups have very different needs for an upload tool and all three groups are probably very different likely to take part in any vote or even know about a vote. A vote should be planned with this difference in view. IMHO only members of the 315 group are actually participating in the discussions. The group of 2 million are the one's whose opinions count outside of the Wikipedia bubble. And the third group (who uploaded 40% of all files) are the people who would benefit the most from better tools. C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 11:52, 5 January 2024 (UTC)Reply[reply]
    We should ask for what they see as most relevant tool and not for the tool they are using. I do not use the UploadWizard but as a Wikiloves contest organizer this is the most important tool for me despite I do not use it to upload files myself. GPSLeo (talk) 13:35, 5 January 2024 (UTC)Reply[reply]
    • @GPSLeo: I'm all for a poll, but the fact remains: what is most valuable isn't always what's easy, and I wouldn't expect any development team to be driven entirely by value of features without weighing difficulty of implementation. - Jmabel ! talk 20:07, 5 January 2024 (UTC)Reply[reply]