<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.comp.python.catalog">
    <title>gmane.comp.python.catalog</title>
    <link>http://blog.gmane.org/gmane.comp.python.catalog</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5660"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5655"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5637"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5626"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5624"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5623"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5620"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5618"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5598"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5597"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5595"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5582"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5576"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5575"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5559"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5534"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5514"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5494"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5490"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.catalog/5455"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5660">
    <title>Reminder: catalog-sig is retired</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5660</link>
    <description>&lt;pre&gt;Please direct all followups / CCs to distutils-sig.


Thankyou,

    Richard
_______________________________________________
Catalog-SIG mailing list
Catalog-SIG&amp;lt; at &amp;gt;python.org
http://mail.python.org/mailman/listinfo/catalog-sig
&lt;/pre&gt;</description>
    <dc:creator>Richard Jones</dc:creator>
    <dc:date>2013-04-01T06:41:10</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5655">
    <title>Shutting down catalog-sig</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5655</link>
    <description>&lt;pre&gt;Hi all,

We're about to merge the catalog-sig and distutils-sig by just removing the
catalog-sig mailing list. If you wish to remain in the discussions
regarding Python package cataloging then please subscribe to the distutils
SIG.

The catalog SIG archives will remain, but the mailing list will be deleted
and the SIG will be retired.

There's no real timeframe but it will be happening imminently.


     Richard
_______________________________________________
Catalog-SIG mailing list
Catalog-SIG&amp;lt; at &amp;gt;python.org
http://mail.python.org/mailman/listinfo/catalog-sig
&lt;/pre&gt;</description>
    <dc:creator>Richard Jones</dc:creator>
    <dc:date>2013-03-30T22:16:38</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5637">
    <title>How to determine if archive is an sdist or bdist</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5637</link>
    <description>&lt;pre&gt;Is there an easy way to programmatically tell if an archive (tar.gz, zip,
etc.) in the dist directory is a binary or sdist? I would like to
post-process the contents of a dist directory and classify each build
artifact there (egg, sdist, bdist, etc.).

Currently the only approach I know of is to have my own command that is run
along with the relevant build command.  For example:

python setup.py sdist be_funky

or:
python setup.py sdist bdist bdist_egg be_funky

Using this approach the tuples in  self.distribution.dist_files provide the
command, python version and file created. Unfortunately this solution is
slightly more complicated in my use case than simply having an easy way to
classify each build artifact and extract it's pkg-info.
_______________________________________________
Catalog-SIG mailing list
Catalog-SIG&amp;lt; at &amp;gt;python.org
http://mail.python.org/mailman/listinfo/catalog-sig
&lt;/pre&gt;</description>
    <dc:creator>James Carpenter</dc:creator>
    <dc:date>2013-03-28T19:57:09</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5626">
    <title>Merge catalog-sig and distutils-sig</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5626</link>
    <description>&lt;pre&gt;Is there much point in keeping catalog-sig and distutils-sig separate?

It seems to me that most of the same people are on both lists, and the topics almost always have consequences to both sides of the coin. So much so that it's often hard to pick *which* of the two (or both) lists you post too. Further confused by the fact that distutils is hopefully someday going to go away :)

Not sure if there's some official process for requesting it or not, but I think we should merge the two lists and just make packaging-sig to umbrella the entire packaging topics.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

_______________________________________________
Catalog-SIG mailing list
Catalog-SIG&amp;lt; at &amp;gt;python.org
http://mail.python.org/mailman/listinfo/catalog-sig
&lt;/pre&gt;</description>
    <dc:creator>Donald Stufft</dc:creator>
    <dc:date>2013-03-28T18:22:59</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5624">
    <title>PEP 438 progress update</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5624</link>
    <description>&lt;pre&gt;Hi all,

It was my intention to formally accept the PEP and deploy the
implementation to the production PyPI when I got back home this week,
but things have been quite hectic and I've not found the time to
perform the pre-deployment tasks needed (specifically steps 5, 6 and 7
of the first transition phase; determining the various email lists I
need to inform people of the changes.) At the moment I'm not sure when
I'll have time to do that; hopefully next week some time but it could
be as long as four weeks before I get sufficient tuits(round).


    Richard
&lt;/pre&gt;</description>
    <dc:creator>Richard Jones</dc:creator>
    <dc:date>2013-03-28T00:29:54</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5623">
    <title>c.pypi.python.org - IP address change</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5623</link>
    <description>&lt;pre&gt;-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi there,

I moved my c.pypi.python.org mirror to a new faster machine.

Please update the DNS entry to 176.9.146.29.

This mirror is  running on top of Christian Theune's bandersnatch
implementation.

Andrreas

- -- 
ZOPYX Limited         | Python | Zope | Plone | MongoDB
Hundskapfklinge 33    | Consulting &amp;amp; Development
D-72074 Tübingen      | Electronic Publishing Solutions
www.zopyx.com         | Scalable Web Solutions
- --------------------------------------------------
Produce &amp;amp; Publish - www.produce-and-publish.com


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQGUBAEBAgAGBQJRUxYrAAoJEADcfz7u4AZj+/kLwLo1gWKQfaVXMjJxN+6Ouens
0+ODnkdGhXFBAcwHM+VpBumFCodST8Cc3iEIT6EGK9HVZEMh7w9cBBO/jKrFX87K
7FYNdybEu81BLa1DxuZh3ux8xDC/bDj4lArYJLF3VcjSL2ZtQTaNyScb/u3n5VR2
pWFKppwF6VQ3P1n5RdmzAHIzF6XGixlR7kpKRJVS37ADfl8yR7ZB7frXzhux6qDn
f5c32QccT5RLKUk6R46GQU8+nHRVRVqum/hep5hX2wXVTeKfuEa8+MZOa/Ooot9r
P8Z1nBIbteivg0hpmX5b0G00h+DQkd29TP7wF/JZwwzu1bc5wXNCVpnNeXDV4Bi3
ON9uZnKFCSxLEKznPQaf3ZiPagxwX8fs/RrK/isO0MyW3HKqaDb77N0biAdWipt5
Mnv6XSyKK5CHte5JVtnpT4UqbLFKMEydQoK8JhYEBwgJABaNuHkYfKNVVDNN8G6K
X7mHX5f7ykirQXwUSfneLY2JYz7jn7o=
=+lq1
-----END PGP SIGNATURE-----
_______________________________________________
Catalog-SIG mailing list
Catalog-SIG&amp;lt; at &amp;gt;python.org
http://mail.python.org/mailman/listinfo/catalog-sig
&lt;/pre&gt;</description>
    <dc:creator>Andreas Jung</dc:creator>
    <dc:date>2013-03-27T15:54:19</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5620">
    <title>Suscribing to PYPI projects</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5620</link>
    <description>&lt;pre&gt;-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I wonder if would be too difficult to be able to subscribe to projects
in PYPI, to be notified if a new version is available.

An option to PIP &amp;amp; family to verify local versions with PYPI versions,
and report old version would be useful too.

- -- 
Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
jcea&amp;lt; at &amp;gt;jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
Twitter: &amp;lt; at &amp;gt;jcea                        _/_/    _/_/          _/_/_/_/_/
jabber / xmpp:jcea&amp;lt; at &amp;gt;jabber.org  _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQCVAwUBUVIAD5lgi5GaxT1NAQJtfwQAhTeby09fEx/0smKy+FKKP+YacAHyfvY1
HvuxsipLanFaiCcRaxWzyzN9+2hUqD88BtUGzgNdqGS52ePxDg5dTC8u4IC0grMU
vk96tl0zMg3R4GraCzsShKGJm8arpdUfWJZXGy+FxMh7XYnrHWZkItUAHTWLuf7A
beTiCnZhuGY=
=s0VI
-----END PGP SIGNATURE-----
&lt;/pre&gt;</description>
    <dc:creator>Jesus Cea</dc:creator>
    <dc:date>2013-03-26T20:07:43</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5618">
    <title>error trying to upload by package</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5618</link>
    <description>&lt;pre&gt;Hi All,

I have a package called files: https://github.com/Simplistix/files

...but I get a 403 when I try to register it on PyPI.

Why is that?

cheers,

Chris

&lt;/pre&gt;</description>
    <dc:creator>Chris Withers</dc:creator>
    <dc:date>2013-03-26T14:02:21</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5598">
    <title>PyPI Crediting</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5598</link>
    <description>&lt;pre&gt;Does anybody think that PyPI source code base should include the names of
the people who contributed to its development?
&lt;/pre&gt;</description>
    <dc:creator>anatoly techtonik</dc:creator>
    <dc:date>2013-03-22T08:32:00</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5597">
    <title>PyPI web interface UX (Was: API for uploading packages to PyPI)</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5597</link>
    <description>&lt;pre&gt;OMG. I didn't even looked at the boxes. IMHO somebody should reduce the
amount of duplication and choices between menu and boxes. It is
really-really overburdened. For example, no need to say "use search above"
when it is evident the you need to find the button, or to use that "browse
all packages" link when it is actually a first item on the menu.

RSS should be moved out of scope of main menu. It is just "yikes!".
The whole menu section with links to main Python web site should be moved
out of place (is there a quick way to measure how many users follow these
links)?

Yes, I can send a patch if everyone agrees.
&lt;/pre&gt;</description>
    <dc:creator>anatoly techtonik</dc:creator>
    <dc:date>2013-03-22T08:26:25</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5595">
    <title>API for uploading packages to PyPI</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5595</link>
    <description>&lt;pre&gt;Hi,

I understand that this will make PyPI a potential target for automated spam
bots, but still it will be awesome to have an API to upload packages to
PyPI.

For example, I have a code that extract all necessary meta data for the
package from the source file itself. It is even able to generate setup.py
from this data. https://bitbucket.org/techtonik/astdump The next logical
step in this chain is to teach it to upload stuff to PyPI.

Now I thought that this setup.py is an unnecessary complication. What I
need, ideally is just upload single .py file, or a JSON and a .tar.gz FWIW.
Is there a straightforward API for things like that?

Please, CC.
&lt;/pre&gt;</description>
    <dc:creator>anatoly techtonik</dc:creator>
    <dc:date>2013-03-22T07:37:18</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5582">
    <title>Access to Windows' cert store</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5582</link>
    <description>&lt;pre&gt;Hi,

the message is slightly off-topic but it might be interesting for pip,
setuptools and other developers that are working on HTTPS for PyPI.

I while ago I found C++ example code that shows how to dump CA and CRL
certs from Windows's system cert store. The system cert store contains
the certificates used by Windows, IE etc.

Yesterday I reimplemented the C++ code with Python and ctypes. I have
tested it with Python 2.6 to 3.3 (x86 and x86_64) on Windows 7. It
should work with Windows XP / Windows Server 2003 and all newer versions
of Windows. The output is usabl by Python's SSL module but you have to
dump the certs to a file first.

I'm planing to add the feature to Python 3.4, too.
http://bugs.python.org/issue17134

You can download the code from

  https://bitbucket.org/tiran/wincertstore


Regards,
Christian
&lt;/pre&gt;</description>
    <dc:creator>Christian Heimes</dc:creator>
    <dc:date>2013-03-21T12:06:07</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5576">
    <title>Updated PEP 438</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5576</link>
    <description>&lt;pre&gt;I've pushed the latest PEP to the repos. It has all the recent
clarifications and the API docs. Just need to wait for the website to
rebuild or something.

Unless there's any last-minute problems I'll accept the PEP in this
form and push the implementation to the production PyPI next week
after I fly home.


     Richard
&lt;/pre&gt;</description>
    <dc:creator>Richard Jones</dc:creator>
    <dc:date>2013-03-21T00:30:09</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5575">
    <title>Replacement client for pep381client</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5575</link>
    <description>&lt;pre&gt;Hi,

as you might be aware, I've done my share on bitching about my mirror 
(f.pypi.python.org) breaking.

I have picked pep381client apart yesterday and rebuilt it - mostly from 
ground up.

You can find a working version here:
https://bitbucket.org/ctheune/bandersnatch

The focus has been on making it a lot more robust and a lot easier to 
repair a mirror when it's known to be broken. To achieve that I:

- refactored the code, trying to make it more intentional, less mechanical
- stop parsing the simple pages' html and make more use of the XML-RPC API
- add Tarek's worker/queue approach for parallelizing it
- keep as little state as possible on the client
- switch form timestamps to serial counters for checking what and how 
much to update
- handle locking of concurrent runs more gracefully

I think I have a good grasp of what's going on now so that I can keep 
maintining this in the future.

I'm currently re-initializing my own mirror. This basically can be run 
in-place by just removing the existing state data and calling my sync 
script (bsn-mirror) instead of pep381run with the same parameters.

Tomorrow I'll update the documentation, make it use a config file and 
put some lipstick on the main entry point. After that I should be ready 
for a release.

If you want to give it a try already, you just do this:

$ hg clone https://bitbucket/org/ctheune/bandersnatch
$ cd bandersnatch
$ virtualenv-2.7 .
$ bin/python bootstrap.py
$ bin/buildout
$ bin/bsn-mirror /my/mirror/path

Cheers,
Christian_______________________________________________
Catalog-SIG mailing list
Catalog-SIG&amp;lt; at &amp;gt;python.org
http://mail.python.org/mailman/listinfo/catalog-sig
&lt;/pre&gt;</description>
    <dc:creator>Christian Theune</dc:creator>
    <dc:date>2013-03-20T23:59:21</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5559">
    <title>PEP 438 implementation on testpypi</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5559</link>
    <description>&lt;pre&gt;Thanks to Donald Stufft for his implementation of the PEP 438 changes,
I've made them live on testpypi.python.org - specifically the "urls"
page of package administration. Please poke and play.


     Richard
&lt;/pre&gt;</description>
    <dc:creator>Richard Jones</dc:creator>
    <dc:date>2013-03-20T18:26:45</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5534">
    <title>V4 Pre-PEP: transition to release-file hosting on PYPI</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5534</link>
    <description>&lt;pre&gt;Hi all, in particular Philip, Marc-Andre, Donald,

Carl and me decided to simplify the PEP and avoid the somewhat
awkward ``simple/-with-externals`` index for various reasons, among them
Marc-Andre's criticisms.  This also means present-day installation tools
(shipped with Redhat/Debian/etc.) will continue to work as today for
those packages which remain in a hosting-mode that requires crawling and
scraping.  They will still benefit from the fact that most packages will
soon have a hosting-mode that avoids it.  Future releases of installation
tools will default to not perform crawling or using (scraped) external
links, and new PYPI projects will default to only serve uploaded files.

The V4 pre-PEP also renames the three PyPI hosting modes to be more
descriptive. Since all three modes allow external links, "pypi-ext" vs
"pypi-only" were misleading. The new naming distinguishes the mode that both
scrapes links from metadata and crawls external pages for more links
("pypi-scrape-crawl") from the mode that only scrapes links from metadata
("pypi-scrape") from the mode where all links are explicit ("pypi-explicit").

Without the separate external index, it also turns out that the two transition
phases are separated into PyPI changes (phase one) and installer-tool
updates (phase two). There are no PyPI changes necessary in phase two.
As stated in a new open question, it should be possible to do 
PEP-related installation tool updates during phase 1, that may require
a bit of clarification in the PEP's language still.

Carl and me are happy with this PEP version now and hope you all are as
well.  Donald is already working on improving the analysis tool so
we hopefully have some updated numbers soon.

cheers,

Holger


PEP: XXX
Title: Transitioning to release-file hosting on PyPI
Version: $Revision$
Last-Modified: $Date$
Author: Holger Krekel &amp;lt;holger&amp;lt; at &amp;gt;merlinux.eu&amp;gt;, Carl Meyer &amp;lt;carl&amp;lt; at &amp;gt;oddbird.net&amp;gt;
Discussions-To: catalog-sig&amp;lt; at &amp;gt;python.org
Status: Draft (PRE-submit V4)
Type: Process
Content-Type: text/x-rst
Created: 10-Mar-2013
Post-History:


Abstract
========

This PEP proposes a backward-compatible two-phase transition process
to speed up, simplify and robustify installing from the
pypi.python.org (PyPI) package index.  To ease the transition and
minimize client-side friction, **no changes to distutils or existing
installation tools are required in order to benefit from the first
transition phase, which will result in faster, more reliable installs
for most existing packages**.

The first transition phase implements an easy and explicit means for a
package maintainer to control which release file links are served to
present-day installation tools.  The first phase also includes the
implementation of analysis tools for present-day packages, to support
communication with package maintainers and the automated setting of
default modes for controlling release file links.  The first phase
also will make new projects on PYPI use a default to only serve 
links to release files which were uploaded to PYPI.

The second transition phase concerns end-user installation tools,
which shall default to only install release files that are hosted on
PyPI and tell the user if external release files exist, offering
a choice to automatically use those external files.


Rationale
=========

.. _history:

History and motivations for external hosting
--------------------------------------------

When PyPI went online, it offered release registration but had no
facility to host release files itself.  When hosting was added, no
automated downloading tool existed yet.  When Philip Eby implemented
automated downloading (through setuptools), he made the choice to
allow people to use download hosts of their choice.  The finding of
externally-hosted packages was implemented as follows:

#. The PyPI ``simple/`` index for a package contains all links found
   by scraping them from that package's long_description metadata for 
   any release. Links in the "Download-URL" and "Home-page" metadata
   fields are given ``rel=download`` and ``rel=homepage`` attributes,
   respectively.

#. Any of these links whose target is a file whose name appears to be
   in the form of an installable source or binary distribution, with
   name in the form "packagename-version.ARCHIVEEXT", is considered a
   potential installation candidate by installation tools.

#. Similarly, any links suffixed with an "#egg=packagename-version"
   fragment are considered an installation candidate.

#. Additionally, the ``rel=homepage`` and ``rel=download`` links are
   crawled by installation tools and, if HTML, are themselves scraped
   for release-file links in the above formats.

Today, most packages released on PyPI host their release files on
PyPI, but a small percentage (XXX need updated data) rely on external
hosting.

There are many reasons [2]_ why people have chosen external
hosting. To cite just a few:

- release processes and scripts have been developed already and upload
  to external sites

- it takes too long to upload large files from some places in the
  world

- export restrictions e.g. for crypto-related software

- company policies which require offering open source packages
  through own sites

- problems with integrating uploading to PyPI into one's release
  process (because of release policies)

- desiring download statistics different from those maintained by PyPI

- perceived bad reliability of PyPI

- not aware that PyPI offers file-hosting

Irrespective of the present-day validity of these reasons, there
clearly is a history why people choose to host files externally and it
even was for some time the only way you could do things.  This PEP
takes the position that there are at least some valid reasons for
external hosting.

Problem
-------

**Today, python package installers (pip, easy_install, buildout, and
others) often need to query many non-PyPI URLs even if there are no
externally hosted files**.  Apart from querying pypi.python.org's
simple index pages, also all homepages and download pages ever
specified with any release of a package are crawled by an installer.
The need for installers to crawl external sites slows down
installation and makes for a brittle and unreliable installation
process.  Those sites and packages also don't take part in the
:pep:`381` mirroring infrastructure, further decreasing reliability
and speed of automated installation processes around the world.

Most packages are hosted directly on pypi.python.org [1]_.  Even for
these packages, installers still crawl their homepage and
download-url, if specified.  Many package uploaders are not aware that
specifying the "homepage" or "download-url" in their package metadata
will needlessly slow down the installation process for all users.

Relying on third party sites also opens up more attack vectors for
injecting malicious packages into sites using automated installs.  A
simple attack might just involve getting hold of an old now-unused
homepage domain and placing malicious packages there.  Moreover,
performing a Man-in-The-Middle (MITM) attack between an installation
site and any of the download sites can inject malicious packages on
the installation site.  As many homepages and download locations are
using HTTP and not HTTPS, such attacks are not hard to launch.  Such
MITM attacks can easily happen even for packages which never intended
to host files externally as their homepages are contacted by
installers anyway.

There is currently no way for package maintainers to avoid
external-link crawling, other than removing all homepage/download url
metadata for all historic releases.  While a script [3]_ has been
written to perform this action, it is not a good general solution
because it removes useful metadata from PyPI releases.

Even if the sites referenced by "Homepage" and "Download-URL" links were 
not scraped for further links, there is no obvious way under the current
system for a package owner to link to an installable file from a 
long_description metadata field (which is shown as package documentation
on ``/pypi/PKG``) without installation tools automatically considering
that file a candidate for installation.  Conversely, there is no way
to explicitely register multiple external release files without 
putting them in metadata fields.


Goals
-----

These are the goals to be achieved by implementation of this PEP:

* Package owners should be able to explicitly control which files are
  presented by PyPI to installer tools as installation
  candidates. Installation should not be slowed and made less reliable
  by extensive and unnecessary crawling of links that package owners
  did not explicitly nominate as installation files.

* It should remain possible for package owners to choose to host their
  release files on their own hosting, external to PyPI. It should be
  easy for a user to request the installation of such releases using
  automated installer tools.

* Automated installer tools should not install externally-hosted
  packages **by default**, but only when explicitly authorized to do
  so by the user. When tools refuse to install such a package by
  default, they should tell the user exactly which external link(s)
  they would need to follow, and what option(s) the user can provide
  to authorize the tool to follow those links. PyPI should provide all
  necessary metadata for installer tools to implement this easily
  and within a single request/reply interaction.

* Migration from the status quo to the above points should be gradual
  and minimize breakage. This includes tooling that makes it easy for
  package owners with an existing release process that uploads to
  non-PyPI hosting to also upload those release files to PyPI.  


Solution / two transition phases
================================

The first transition phase introduces a "hosting-mode" field for each
project on PyPI, allowing package owners explicit control of which
release file links are served to present-day installation tools in the
machine-readable ``simple/`` index. The first transition will, after
successful hosting-mode manipulations by individual early-adopters,
set a default hosting mode for existing packages, based on
automated analysis.  **Maintainers will be notified one month ahead of
any such automated change**.  At completion of the first transition
phase, **all present-day existing release and installation processes
and tools are expected to continue working**.  Any remaining errors or
problems are expected to only relate to installation of individual
packages and can be easily corrected by package maintainers or PyPI
admins if maintainers are not reachable.

Also in the first phase, each link served in the ``simple/`` index
will be explicitly marked as ``rel="internal"`` (hosted by the index
itself) or ``rel="external"`` (linking to an external site that is not
part of the index).

In the second transition phase, PyPI client installation tools shall
be updated to default to only install ``rel="internal"`` packages
unless a user specifies option(s) to permit installing from external
links.

Maintainers of packages which currently host release files on non-PyPI
sites shall receive instructions and tools to ease "re-hosting" of
their historic and future package release files.  This re-hosting tool
MUST be available before automated hosting-mode changes are announced
to package maintainers.


Implementation
==============

Hosting modes
-------------

The foundation of the first transition phase is the introduction of
three "modes" of PyPI hosting for a package, affecting which links are
generated for the ``simple/`` index.  These modes are implemented
without requiring changes to installation tools via changes to the
algorithm for generating the machine-readable ``simple/`` index.

The modes are:

- ``pypi-scrape-crawl``: no change from the current situation of
  generating machine-readable links for installation tools, as
  outlined in the history_.

- ``pypi-scrape``: for a package in this mode, links to be added to
  the ``simple/`` index are still scraped from package
  metadata. However, the "Home-page" and "Download-url" links are
  given ``rel=ext-homepage`` and ``rel=ext-download`` attributes
  instead of ``rel=homepage`` and ``rel=download``. The effect of this
  (with no change in installation tools necessary) is that these links
  will not be followed and scraped for further candidate links by present-day
  installation tools: only installable files directly hosted from PYPI or
  linked directly from PyPI metadata will be considered for installation.
  Installation tools MAY evolve to offer an option to use the new 
  rel-attribution to crawl external pages but MUST NOT default to it.

- ``pypi-explicit``: for a package in this mode, only links to release
  files uploaded to PyPI, and external links to release files
  explicitly nominated by the package owner (via a new interface
  exposed by PyPI) will be added to the ``simple/`` index.

Thus the hope is that eventually all projects on PyPI can be migrated
to the ``pypi-explicit`` mode, while preserving the ability to install
release files hosted externally via installer tools. Deprecation of
hosting modes to eventually only allow the ``pypi-explicit`` mode is
NOT REGULATED by this PEP but is expected to become feasible some time
after successful implementation of the transition phases described in
this PEP.  It is expected that deprecation requires **a new process to deal 
with abandoned packages** because of unreachable maintainers for still
popular packages.


First transition phase (PyPI)
-----------------------------

The proposed solution consists of multiple implementation and
communication steps:

#. Implement in PyPI the three modes described above, with an
   interface for package owners to select the mode for each package
   and register explicit external file URLs.

#. For packages in all modes, label all links in the ``simple/`` index
   with ``rel="internal"`` or ``rel="external"``, to make it easier
   for client tools to distinguish the types of links in the second
   transition phase.

#. Default all newly-registered packages to ``pypi-explicit`` mode
   (package owners can still switch to the other modes as desired).

#. Determine (via an automated analysis tool) which packages have all
   installable files available on PyPI itself (group A), which have
   all installable files linked directly from PyPI metadata (group B),
   and which have installable versions available that are linked only
   from external homepage/download HTML pages (group C).

#. Send mail to maintainers of projects in group A that their project
   will be automatically configured to ``pypi-explicit`` mode in one
   month, and similarly to maintainers of projects in group B that
   their project will be automatically configured to ``pypi-scrape``
   mode.  Inform them that this change is not expected to affect
   installability of their project at all, but will result in faster
   and safer installs for their users.  Encourage them to set this
   mode themselves sooner to benefit their users.

#. Send mail to maintainers of packages in group C that their package
   hosting mode is ``pypi-scrape-crawl``, list the URLs which
   currently are crawled, and suggest that they either re-host their
   packages directly on PyPI and switch to ``pypi-explicit``, or at
   least provide direct links to release files in PyPI metadata and
   switch to ``pypi-scrape``.  Provide instructions and tools to help
   with these transitions.


Second transition phase (installer tools)
-----------------------------------------

For the second transition phase, maintainers of installation tools are
asked to release two updates. 

The first update shall provide clear warnings if externally-hosted
release files (that is, files whose link is ``rel="external"``) are
selected for download, for which projects and URLs exactly this
happens, and warn that in future versions externally-hosted downloads
will be disabled by default.

The second update should change the default mode to allow only
installation of ``rel="internal"`` package files, and allow
installation of externally-hosted packages only when the user supplies
an option (ideally an option specifying exactly which external domains
are to be trusted as download sources). When download of an
externally-hosted package is disallowed, the user should be notified,
with instructions for how to make the install succeed and warnings
about the implication (that a file will be downloaded from a site that
is not part of the package index).


Open questions / Tasks
===========================

- Should we introduce some form of PyPI API versioning in this PEP?
  (it might complicate matters and delay the implementation but is
  often seen as good practise).

- in pypi-scrape mode: does PYPI determine itself what are installation
  candidates and avoids presenting other random links (which are currently
  served)?

- consider that installation tools may choose to release updates 
  during transition phase 1 already, to warn about crawling and scraped
  links (which are easily identifiable today and after the new rel-attribution
  after transition phase 1).


References
==========

.. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted, http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html (XXX need to update this data for all easy_install-supported formats)

.. [2] Marc-Andre Lemburg, reasons for external hosting, http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html

.. [3] Holger Krekel, Script to remove homepage/download metadata for all releases http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html

Acknowledgments
================

Philip Eby for precise information and the basic ideas to implement
the transition via server-side changes only.

Donald Stufft for pushing away from external hosting and offering to
implement both a Pull Request for the necessary PyPI changes and the
analysis tool to drive the transition phase 1.

Marc-Andre Lemburg, Nick Coghlan and catalog-sig in general for
thinking through issues regarding getting rid of "external hosting".

Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:
&lt;/pre&gt;</description>
    <dc:creator>holger krekel</dc:creator>
    <dc:date>2013-03-15T09:29:59</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5514">
    <title>ResponseNotReady error while trying to do fresh sync</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5514</link>
    <description>&lt;pre&gt;Hello,
I'm maintaining e.pypi.python.org (with Aron Xu).
We met some issues on our network attached storage, so we decided to
do a fresh sync of pypi.
We met an issue while doing that,

we got an exception httplib.ResponseNotReady

similar to this mail
"http://mail.python.org/pipermail/catalog-sig/2013-February/005224.html"

Currently, we ignored all packages with that issues, and finish the sync.

But there would be some files missing.

The three packages which cause that exception are listed below:
https://pypi.python.org/simple/iterator/
https://pypi.python.org/simple/nester_test_ling/
https://pypi.python.org/simple/nesterswe/

Please notify us when it get fixed, so that we can update it and make
it completed.

Best Regards,
Qijiang Fan
&lt;/pre&gt;</description>
    <dc:creator>Qijiang Fan</dc:creator>
    <dc:date>2013-03-14T04:17:35</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5494">
    <title>V3 PEP-draft for transitioning to pypi-hosting ofrelease files</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5494</link>
    <description>&lt;pre&gt;Hi all,

after some more discussions and hours spend by Carl Meyer (who is now
co-authoring the PEP) and me, here is a new V3 pre-submit draft.  
It is now more ambitious than the previous draft as should be obvious
from the modified abstract (and Carl Meyers and Philip's earlier
interactions on this list).  There also are more details of how
the current link-scraping works among other improvements and incorporations
of feedback from discussions here.

We intend to submit this draft tonight to the PEP editors.  

Feedback now and later remains welcome.  I am sure there are issues to 
be sorted and clarified, among them the versioning-API suggestion by 
Marc-Andre.

Thanks for everybody's support and feedback so far,
holger


PEP: XXX
Title: Transitioning to release-file hosting on PyPI
Version: $Revision$
Last-Modified: $Date$
Author: Holger Krekel &amp;lt;holger&amp;lt; at &amp;gt;merlinux.eu&amp;gt;, Carl Meyer &amp;lt;carl&amp;lt; at &amp;gt;oddbird.net&amp;gt;
Discussions-To: catalog-sig&amp;lt; at &amp;gt;python.org
Status: Draft (PRE-submit V3)
Type: Process
Content-Type: text/x-rst
Created: 10-Mar-2013
Post-History:


Abstract
========

This PEP proposes a backward-compatible two-phase transition process to speed
up, simplify and robustify installing from the pypi.python.org (PyPI)
package index.  To ease the transition and minimize client-side
friction, **no changes to distutils or existing installation tools are
required in order to benefit from the transition phases, which is to
result in faster, more reliable installs for most existing packages**.

The first transition phase implements easy and explicit means for
a package maintainter to control which release file links are 
served to present-day installation tools.  The first phase also
includes the implementation of analysis tools for present-day packages,
to support communication with package maintainers and the automated
setting of default modes for controling release file links.   

The second transition phase will result in the current PYPI index 
to only serve PYPI-hosted files by default.  Externally hosted files
will still be automatically discoverable through a second index. 
Present-day installation tools will be able to continue working
by specifying this second index.  New versions of installation
tools shall default to only install packages from PYPI unless
the user explicitely wishes to include non-PYPI sites.



Rationale
=========

.. _history:

History and motivations for external hosting
--------------------------------------------

When PyPI went online, it offered release registration but had no
facility to host release files itself.  When hosting was added, no
automated downloading tool existed yet.  When Philip Eby implemented
automated downloading (through setuptools), he made the choice to
allow people to use download hosts of their choice.  The finding of
externally-hosted packages was implemented as follows:

#. The PyPI ``simple/`` index for a package contains all links found
   anywhere in that package's metadata for any release. Links in the
   "Download-URL" and "Home-page" metadata fields are given
   ``rel=download`` and ``rel=homepage`` attributes, respectively.

#. Any of these links whose target is a file whose name appears to be
   in the form of an installable source or binary distribution, with
   basename in the form "packagename-version.ARCHIVEEXT", is considered 
   a potential installation candidate.

#. Similarly, any links suffixed with an "#egg=packagename-version"
   fragment are considered an installation candidate.

#. Additionally, the ``rel=homepage`` and ``rel=download`` links are
   followed and, if HTML, are themselves scraped for release-file links
   in the above formats.

Today, most packages released on PyPI host their release files on
PyPI, but a small percentage (XXX need updated data) rely on external
hosting.

There are many reasons [2]_ why people have chosen external
hosting. To cite just a few:

- release processes and scripts have been developed already and upload
  to external sites

- it takes too long to upload large files from some places in the
  world

- export restrictions e.g. for crypto-related software

- company policies which require offering open source packages
  through own sites

- problems with integrating uploading to PYPI into one's release
  process (because of release policies)

- desiring download statistics different from those maintained by PyPI

- perceived bad reliability of PYPI

- not aware that PyPI offers file-hosting

Irrespective of the present-day validity of these reasons, there
clearly is a history why people choose to host files externally and it
even was for some time the only way you could do things.


Problem
-------

**Today, python package installers (pip, easy_install, buildout, and
others) often need to query many non-PyPI URLs even if there are no
externally hosted files**.  Apart from querying pypi.python.org's
simple index pages, also all homepages and download pages ever
specified with any release of a package are crawled by an installer.
The need for installers to crawl external sites slows down
installation and makes for a brittle and unreliable installation
process.  Those sites and packages also don't take part in the
:pep:`381` mirroring infrastructure, further decreasing reliability
and speed of automated installation processes around the world.

Most packages are hosted directly on pypi.python.org [1]_.  Even for
these packages, installers still crawl the homepage(s) of a package.
Many package uploaders are not aware that specifying the "homepage" in
their release process will slow down the installation process for all
users.

Relying on third party sites also opens up more attack vectors for
injecting malicious packages into sites using automated installs.  A
simple attack might just involve getting hold of an old now-unused
homepage domain and placing malicious packages there.  Moreover,
performing a Man-in-The-Middle (MITM) attack between an installation
site and any of the download sites can inject malicious packages on
the installation site.  As many homepages and download locations are
using HTTP and not HTTPS, such attacks are not hard to launch.  Such
MITM attacks can easily happen even for packages which never intended
to host files externally as their homepages are contacted by
installers anyway.

There is currently no way for package maintainers to avoid 3rd party
crawling, other than removing all homepage/download url metadata for
all historic releases.  While a script [3]_ has been written to
perform this action, it is not a good general solution because it
removes semantic information like the "homepage" specification from
PYPI packages.

Even if the "Homepage" and "Download-URL" links were not scraped for
further links, there is still no way under the current system for a
package owner to link to an installable file from their package
metadata without installation tools automatically considering that
file a candidate for installation.


Solution / two transition phases
================================

This first transition phase starts off by introducing a "hosting-mode"
field for each project on PYPI, allowing explicit control of which
machine-readable release file links are served to present-day
installation tools.  The first transition will, after successful
hosting-mode manipulations of individual early-adopters, then set a
default hosting mode for existing packages, based on automated anaylsis.
**Maintainers will be notified one month ahead of any such automated
change**.  At completion of the first transition phase, **all
present-day existing release and installation processes and tools are
expected to continue working**.  Any remaining errors or problems are
expected to only relate to installation of individual packages and can
be easily corrected by package maintainers or PYPI admins if maintainers
are not reachable.

**The second transition phase will then get PyPI, after a three month
warning period, to only serve links for PyPI-hosted packages under the 
present-day ``simple/`` index**.  At this point, present-day installation 
tools will not see externally hosted links anymore, unless they specify
a new ``simple/-with-externals`` index which PYPI MUST offer ahead of 
the start of the second transition phase.  This new index contains 
the external links as controled by a package maintainer.  Moreover, PYPI 
MUST also provide means to register and control download
links, independently from the current metadata and remote html-scraping 
methods.  At completion of the second transition phase, all present-day
installation tools will and all future installation releases SHALL
default to only install PYPI-hosted packages unless a user specifies
option(s) to include external links or the external index.   If an
installation tool chooses to use the new ``simple/-with-externals/`` as
a default, it MUST warn a user with a precise messsage of which external
links were followed.

Maintainers of packages which currently host release files on non-PyPI
sites shall receive instructions and tools to ease "re-hosting" of
their historic and future package release files.  The implementation
of such a re-hosting tool is expected but NOT REQUIRED to be available 
at the beginning of phase 2.


Implementation
==============

The foundation of both transition phases is the introduction of three
"modes" of PyPI hosting for a package, effecting which links are
generated for the ``simple/`` index in transition phase 1.  These modes 
are implemented without requiring changes to installation tools via changes 
to the algorithm for generating the machine-readable "/simple" index.

The modes are:

- ``pypi-ext-crawl``: no change from the current situation of generating
  machine-readable links for installation tools, as outlined in the
  history_.

- ``pypi-ext``: for a package in this mode, the "Home-page" and
  "Download-url" links added to the simple index are given
  ``rel=ext-homepage`` and ``rel=ext-download`` attributes instead of
  ``rel=homepage`` and ``rel=download``. The effect of this (with no
  change in installation tools neccessary) is that these links will 
  not be followed and scraped for further candidate links. Only installable 
  files linked directly from PyPI metadata (wherever they are hosted) will be
  considered for installation.

- ``pypi-only``: for a package in this mode, only links to URLs on
  PyPI itself will be added to the simple index.

At the end of the warning period of transition phase 2, the ``simple/``
index will be restricted to only show links to URLs on PyPI itself while the 
``simple/-with-externals`` index will during both transition phases show 
links to PYPI and any externals as controled by the package maintainer 
and the hosting-mode.

For a package in ``pypi-only`` mode, external links will no longer be
automatically scraped from metadata and added to the two indexes.
However, PyPI will expose an interface for package maintainers to
explicitly specify any number of URLs to externally hosted installable
files for a given release, and these URLs will be added to the
``simple/-with-ext`` index page for that project but NOT to the basic 
``simple/`` index page. Thus the ``-with-ext`` alternative index provides 
a means for package owners with good reason to host their packages elsewhere a
means to do so (even under the ``pypi-only`` package mode) and still
have that information reflected on PyPI in machine-readable form, allowing
installation tool users an explicit and easy choice of whether they wish
to read an index that includes externally-hosted packages or one that
does not.

The goal of this PEP is that eventually all projects on PyPI can be
migrated to the ``pypi-only`` mode, while preserving the ability to
install release files hosted from third parties in an automated manner.

Deprecation of hosting-modes to eventually only allow the "pypi-only"
mode is NOT REGULATED by this PEP but is expected to become feasible
some time after successfull implementation of the two transition phases
described in this PEP.


Implementation and interaction timeline
--------------------------------------------------

The proposed solution consists of multiple implementation and
communication steps:

#. Implement in PyPI the three modes and the ``-with-ext`` index as
   described above, and an interface for package owners to select the
   mode for each package and register explicit external file URLs for
   the ``-with-ext`` index (for projects in the ``pypi-only`` mode).
   Default all newly-registered packages to ``pypi-only`` mode (but
   package owners can still switch to the other modes as
   desired). Implement in ``pep381client`` the mirroring of the
   ``-with-ext`` index pages.

#. Determine which packages have installable versions available that
   are linked only from homepage/download pages (group B) and which
   packages have all installable files available on PyPI itself (group
   A).

#. Send mail to maintainers of projects in group A that their project
   is going to be automatically configured to ``pypi-ext`` mode in one
   month.  Inform them that this change is not expected to affect
   installability of their project at all, but will result in faster
   and safer installs for their users.  Encourage them to set this
   mode (or ``pypi-only``) themselves earlier to benefit their users.

#. Send mail to maintainers of packages in group B that their package
   hosting mode is ``pypi-ext-crawl``, list the sites which currently
   are crawled, and suggest that they re-host their packages directly
   on PyPI and then switch to ``pypi-only``.  Provide instructions and
   tools to help with this "re-uploading" process.

In addition, maintainers of installation tools are asked to release
two updates.  The first one shall provide clear warnings if
externally-hosted packages (that is, packages at a URL whose domain
name differs from the domain name of the index URL in use) are
selected for download, for which projects and URLS exactly this
happens, and that in future versions externally-hosted downloads 
will be disabled by default.

The second update for installation tools should change the default
mode to allow only installation of package files hosted at the index
domain, and allow installation of externally-hosted packages only when
the user supplies an option (ideally an option specifying exactly
which external domains are to be trusted as download sources). When
download of an externally-hosted package is disallowed, the user
should be notified, with instructions for how to make the install
succeed and warnings about the potential consequences.

It is expected that tools in this release may choose to change the
default index url to ``https://pypi.python.org/simple/-with-ext`` in
order to support explicitly-registered external URLs for projects in
``pypi-only`` mode. Tools may choose to do this only when the user
requests installation of externally-hosted packages, or may choose to
do this in all cases so as to be able to notify users when an
externally-hosted file is available.

Specific timelines for deprecation of ``pypi-ext-crawl`` and
``pypi-ext`` modes are not mandated in this PEP; this will depend on
observed behavior of package owners and availability of tooling. It is
expected that ``pypi-ext-crawl`` mode will be an early candidate for
deprecation; it may be necessary to leave ``pypi-ext`` mode in place 
for quite some time, at least for those packages already
depending on it (it may be removed as an option for new packages when
tool support for explicit external URLs and the ``-with-ext`` index is
sufficient).



Open questions
==============

- Should we introduce a third index which maintains the old behaviour
  of providing links irrespective of a maintainer's hosting-mode choice?

- should we introduce some form of PYPI API versioning in this PEP?
  (it might complicate matters and delay the implementation but is 
  often seen as good practise)


References
==========

.. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted, http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html (XXX need to update this data for all easy_install-supported formats)

.. [2] Marc-Andre Lemburg, reasons for external hosting, http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html

.. [3] Holger Krekel, Script to remove homepage/download metadata for all releases http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html

Acknowledgements
================

Philip Eby for precise information and the basic ideas to implement
the transition via server-side changes only.

Donald Stufft for pushing away from external hosting and 
and offering to implement both a Pull Request for the neccessary PYPI changes 
and the analysis tool to drive the transition phase 1.

Marc-Andre Lemburg, Nick Coghlan and catalog-sig in general for 
thinking through issues regarding getting rid of "external hosting".

Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:
&lt;/pre&gt;</description>
    <dc:creator>holger krekel</dc:creator>
    <dc:date>2013-03-13T11:21:59</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5490">
    <title>A modest proposal for securing PyPI with TUF</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5490</link>
    <description>&lt;pre&gt;Hello everyone,

I am pleased to announce our demonstration of PyPI and pip with TUF.

Firstly, we solicit your thoughts and comments on our design document 
for integrating PyPI with TUF:

https://docs.google.com/document/d/1sHMhgrGXNCvBZdmjVJzuoN5uMaUAUDWBmn3jo7vxjjw/edit?usp=sharing

Secondly, you may wish to test our demo of PyPI and pip with TUF:

https://github.com/dachshund/pip/wiki/pip-over-TUF

Thirdly, this is how little it takes to secure pip with TUF:

https://github.com/dachshund/pip/compare/develop...tuf

Finally, you may be interested to learn about how one might manually 
secure a PyPI package index with TUF:

https://github.com/dachshund/pip/wiki/PyPI-over-TUF

We are excited to be able to show this to you now, and in person at our 
lightning talk at PyCon this Friday.

We think that there is great potential for the PyPI and TUF community to 
work together to secure Python package management. This is just the 
beginning, and there is some work left to do, but we are confident that 
we have demonstrated to you that PyPI could be secured with TUF in the 
very near future. We would be happy to discuss with you how we compare 
with other proposals.

We look forward to your questions and feedback!

Thanks,
Trishank
&lt;/pre&gt;</description>
    <dc:creator>Trishank Karthik Kuppusamy</dc:creator>
    <dc:date>2013-03-13T06:41:55</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5455">
    <title>setuptools/distribute/easy_install/pkg_resourcesorting algorithm</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5455</link>
    <description>&lt;pre&gt;I've run into a weird issue with easy_install, that I'm trying to solve:

If I place two files named

egenix_mxodbc_connect_client-2.0.2-py2.6.egg
egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip

into the same directory and let easy_install running on Linux
scan this, it considers the second file for Windows as best
match.

Is the algorithm used for determining the best match documented
somewhere ?

I've had a look at the implementation, but this left me rather
clueless.

I thought that setuptools would prefer the .egg file over
the prebuilt .zip file - binary files being easier to install
than "source" files.

Thanks,
&lt;/pre&gt;</description>
    <dc:creator>M.-A. Lemburg</dc:creator>
    <dc:date>2013-03-12T18:15:17</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.catalog/5432">
    <title>V2 pre-PEP: transitioning to release file hosting onPYPI</title>
    <link>http://comments.gmane.org/gmane.comp.python.catalog/5432</link>
    <description>&lt;pre&gt;Hi all,

below is the new PEP pre-submit version (V2) which incorporates the
latest suggestions and aims at a rapidly deployable solution.  Thanks in
particular to Philip, Donald and Marc-Andre.  I also added a few notes
on how installers should behave with respect to non-PYPI crawling.  

I think a PEP like doc is warranted and that we should not silently
change things without proper communication to maintainers and pre-planning
the implementation/change process.  Arguably, the changes are more
invasive than "oh, let's just do a http-&amp;gt;https redirect" which didn't
work too well either.

Now, if there is some agreement, i can submit this PEP officially tomorrow,
and given agreement/refinments from the Pycon folks and the likes of
Richard, we may be able to get going very shortly after Pycon.

cheers,
holger


PEP-draft: transitioning to release-file hosting on PYPI
====================================================================

Status
-----------

PRE-SUBMIT-v2

Abstract
------------

This PEP proposes a backward-compatible transition process to speed up,
simplify and robustify installing from the pypi.python.org (PYPI)
package index.  The initial transition will put most packages on PYPI
automatically in a configuration mode which will prevent client-side
crawling from installers.  To ease automatic transition and minimize
client-side friction, **no changes to distutils or installation tools** are
required.  Instead, the transition is implemented by modifying PYPI to
serve links from ``simple/`` pages in a configurable way, preventing or
allowing crawling of non-PYPI sites for detecting release files.
Maintainers of all PYPI packages will be notified ahead of those
changes.

Maintainers of packages which currently are hosted on non-PYPI sites
shall receive instructions and tools to ease "re-hosting" of their
historic and future package release files.  The implementation of such
tools is NOT required for implementing the initial automatic transition.

Installation tools like pip and easy_install shall warn about crawling
non-PYPI sites and later default to disallow it and only allow it with
an explicit option.


History and motivations for external hosting
------------------------------------------------

When PYPI went online, it offered release registration but had no
facility to host release files itself.  When hosting was added, no
automated downloading tool existed yet.  When Philip Eby implemented
automated downloading (through setuptools), he made the choice 
to allow people to use download hosts of their choice.  This was
implemented by the PYPI ``simple/`` index containing links of type
``rel=homepage`` or ``rel=download`` which are crawled by installation
tools to discover package links.  As of March 2013, a substantial part 
of packages (estimated to about 10%) make use of this mechanism to host
files on github, bitbucket, sourceforge or own hosting sites like 
``mercurial.selenic.com``, to just name a few.

There are many reasons [2]_ why people choose to use external hosting,
to cite just a few:

- release processes and scripts have been developed already and 
  upload to external sites 

- it takes too long to upload large files from some places in the world

- export restrictions e.g. for crypto-related software

- company policies which prescribe offering open source packages through
  own sites

- problems with integrating uploading to PYPI into one's release process
  (because of release policies)

- perceived bad reliability of PYPI

- missing knowlege you can upload files 

Irrespective of the present-day validity of these reasons, there clearly
is a history why people choose to host files externally and it even was 
for some time the only way you could do things.  


Problem
---------------

**Today, python package installers (pip and easy_install) often need to
query non-PYPI sites even if there are no externally hosted files**.
Apart from querying pypi.python.org's simple index pages, also all
homepages and download pages ever specified with any release of a
package are crawled by an installer.  The need for installers to
crawl 3rd party sites slows down installation and makes for a brittle
unreliable installation process.   Those sites and packages also don't 
take part in the :pep:`381` mirroring infrastructure, further decreasing
reliability and speed of automated installation processes around the world. 

Roughly 90% of packages are hosted directly on pypi.python.org [1]_.
Even for them installers still need to crawl the homepage(s) of a
package.  Many package uploaders are particularly not aware that
specifying the "homepage" in their release process will slow down 
the installation process for all its users.

Relying on third party sites also opens up more attack vectors
for injecting malicious packages into sites using automated installs.  
A simple attack might just involve getting hold of an old now-unused
homepage domain and placing mailicious packages there.  Moreover,
performing a Man-in-The-Middle (MITM) attack between an installation
site and any of the download sites can inject mailicious packages on the
installation site.  As many homepages and download locations are using
HTTP and not proper HTTPS, such attacks are not very hard to launch.
Such MITM attacks can happen even for packages which never intended to
host files externally as their homepages are contacted by installers
anyway.

There is currently no way for package maintainers to avoid 3rd party
crawling, other than removing all homepage/download url metadata
for all historic releases.  While a script [3]_ has been written to 
perform this action, it is not a good general solution because it removes
semantic information like the "homepage" specification from PYPI packages.


Solution
-----------

The proposed solution consists of the following implementation and
communication steps:

- determine which packages have releases files only on PYPI (group A)
  and which have externally hosted release files (group B).

- Prepare PYPI implementation to allow a per-project "hosting mode",
  effectively enabling or disabling external crawling.  When enabled 
  nothing changes from the current situation of producing ``rel=download`` 
  and ``rel=homepage`` attributed links on ``simple/`` pages, 
  causing installers to crawl those sites.  
  When disabled, the attributions of links will change 
  to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to
  avoid crawling 3rd party sites.  Retaining the meta-information allows
  tools to still make use of the semantic information.

- send mail to maintainers of A that their project is going to be 
  automatically configured to "disable crawling" in one week
  and encourage them to set this mode earlier to help all of 
  their users.

- send mail to maintainers of B that their package hosting mode 
  is "crawling enabled", and list the sites which currently are crawled,
  and suggest that they re-host their packages directly on PYPI and 
  then switch the hosting-mode "disable crawling".  Provide instructions 
  and at best tools to help with this "re-uploading" process.

In addition, maintainers of installation tools are asked to release
two updates.  The first one shall provide clear warnings if external
crawling needs to happen, for which projects and URLS exactly 
this happens, and that in the future crawling will be disabled by default.  
The next update shall change the default to disallow crawling and allow 
crawling only with an explicit option like ``--crawl-externals`` and 
another option allowing to limit which hosts are allowed to be crawled
at all.


Hosting-Mode state transitions
----------------------------------

1. At the outset, we set hosting-mode to "notset" for all packages.
   This will not change any link served via the simple index and thus
   no bad effects are expected.  Early adopters and testers may now
   change the mode to either "crawl" or "nocrawl" to help with
   streamlining issues in the PYPI implementation.

2. When maintainers of B packages are mailed their mode is directly
   set to "crawl".

3. When maintainers of A are mailed we leave the mode at "notset" to allow
   people to change it to "nocrawl" themselves or to set it to "crawl" 
   if they think they are wrongly in the "A" group.  After a week 
   all "notset" modes are set to "nocrawl".

A week after the mailings all packages will be in "crawl" or "nocrawl"
hosting mode.  It is then a matter of good tools and reaching out to
maintainers of B packages to increase the A/B ratio.

Open questions
----------------------

- Should the support tools for "rehosting" packages be implemented  on the
  server side or on the client side?  Implementing it on the client
  side probably is quicker to get right and less fatal in terms of failures.

- double-check if ``rel=newhomepage`` and ``rel=newdownload`` cause the 
  desired behaviour of pip and easy_install (both the distribute and 
  setuptools based one) to not crawl those pages.

- are the "support tools" for re-hosting outside the scope of this PEP?

- Think some more about pip/easy_install "allow-hosts" mode etc.

References
------------

.. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted, http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html

.. [2] Marc-Andre Lemburg, reasons for external hosting, http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html

.. [3] Holger Krekel, Script to remove homepage/download metadata for
       all releases http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html

Acknowledgments
----------------------

Philip Eby for precise information and the basic ideas to
implement the transition via server-side changes only.

Donald Stufft for pushing away from external hosting and doing
the 90/10 % statistics script and offering to implement a PR.

Marc-Andre Lemburg, Nick Coghlan and catalog-sig for thinking
through issues regarding getting rid of "external hosting".


Copyright
-----------------

This document has been placed in the public domain.
&lt;/pre&gt;</description>
    <dc:creator>holger krekel</dc:creator>
    <dc:date>2013-03-12T11:38:17</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.comp.python.catalog">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.python.catalog</link>
  </textinput>
</rdf:RDF>
