<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict">
    <title>gmane.science.linguistics.edict-jmdict</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/534"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/533"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/532"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/531"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/530"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/529"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/528"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/527"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/526"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/525"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/524"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/523"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/522"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/521"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/520"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/519"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/518"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/517"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/516"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/515"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/534">
    <title>Re: Marking word boundaries in 外来語</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/534</link>
    <description>&lt;pre&gt;

I've decided to go ahead and blast them in. In testing, the update
script processed 15-20 updates a second. I don't want to overtax
the server, so I have put the update in a loop with a 2 second pause
between them, which means it will take about 5 hours to do them all.
It's not loading the server - "top" reports the  idle CPU measure rarely
goes below 98%, and the I/O wait is rarely above 0.1%.

Cheers

Jim

&lt;/pre&gt;</description>
    <dc:creator>Jim Breen</dc:creator>
    <dc:date>2013-05-11T06:28:57</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/533">
    <title>Re: Marking word boundaries in 外来語</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/533</link>
    <description>&lt;pre&gt;More on automatically adding dotted readings for 外来語 entries.

On 26 April 2013 22:51, Jim Breen &amp;lt;jimbreen-Re5JQEeQqe8AvxtiuMwx3w&amp;lt; at &amp;gt;public.gmane.org&amp;gt; wrote:


I've gone over them quite a bit. It's now about 9,100 entries. I'm a lot
happier with it now


With help from Stuart that's now working.  I'm not sure I want to add
9,000 amendments in one session, although it might be best simply to get
it done. The amendment summary report will be vast (see
http://www.edrdg.org/jmdictdb/cgi-bin/updates.py?svc=jmdict&amp;amp;y=2013&amp;amp;m=05&amp;amp;d=07)
but that probably can't be helped.

Cheers

Jim

&lt;/pre&gt;</description>
    <dc:creator>Jim Breen</dc:creator>
    <dc:date>2013-05-07T03:46:52</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/532">
    <title>Re: Marking word boundaries in 外来語</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/532</link>
    <description>&lt;pre&gt;Jim.Breen:

Hi.Jim,
This.is.very.cool!
&lt;/pre&gt;</description>
    <dc:creator>Nils Roland Barth</dc:creator>
    <dc:date>2013-04-26T15:04:38</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/531">
    <title>Marking word boundaries in 外来語</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/531</link>
    <description>&lt;pre&gt;Greetings,

As all of you will know, written Japanese has the practice (not always followed)
of marking the word boundaries in multi-word 外来語 with a middle dot (・), as
was sometimes done in ancient Latin. I encourage people creating or amending
entries in JMdict to do the same, having both unsegmented and
segmented versions,
e.g.  タイトスカート and タイト・スカート. For a variety of reasons it is sometimes
useful to have both versions available.

There are many entries without word boundaries marked, so I have just pumped
all the multi-word 外来語 through some software to create additional versions with
the word boundaries marked. The software is the same as used in the server
at http://hum.csse.unimelb.edu.au/~jwb/gairaigo.html

I have put the results at http://www.csse.monash.edu.au/~jwb/extradots.html
There are about 9,500 entries.

Each time I look at them I see an error or two, so please let me know if
you notice any.

Next task is to see about a script to add them to the data&lt;/pre&gt;</description>
    <dc:creator>Jim Breen</dc:creator>
    <dc:date>2013-04-26T12:51:05</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/530">
    <title>Re: Kanjidic and JLPT kanji levels</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/530</link>
    <description>&lt;pre&gt;
Thanks for the comments. I think I'll leave them as-is, and make a note in the
documentation.

Jim

&lt;/pre&gt;</description>
    <dc:creator>Jim Breen</dc:creator>
    <dc:date>2013-04-20T02:12:06</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/529">
    <title>Re: Kanjidic and JLPT kanji levels</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/529</link>
    <description>&lt;pre&gt;Jim:

It’s probably easiest and accurate to leave as is,
and leave it up to client programs how to display;
I would likely display as “Old JLPT 2” etc.
(Someone else might list as N5, N4, N2, N1, say.)

Apps are going to have to make this decision *anyway*,
unless they show raw tags, so we’re only making life easier
on apps programmers using KanjiDic, who should really understand
well enough.

Formally, the old test levels were 4, 3, 2, 1,
while the new levels are N5, N4, N3, N2, N1,
which disambiguates via an “N” prefix.

In practice people refer to the levels without the prefix,
hence the confusion.

People are going to be confused anyway, as there are no JLPT N3 kanji,
so it’s probably clearer to force the issue and insist that
people understand “these are old lists; the tests have change since”.

For anyone studying for the JLPT, there are countless
resources (online and print) that list vocab, kanji, etc.;
I don’t think we need to feel a responsibility to explain
about the tests.

Gi&lt;/pre&gt;</description>
    <dc:creator>Nils Roland Barth</dc:creator>
    <dc:date>2013-04-19T16:23:44</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/528">
    <title>Re: Kanjidic and JLPT kanji levels</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/528</link>
    <description>&lt;pre&gt;I think it would be better to leave the entries as-is and let people
determine on their own which kanji from which JLPT level they should
study. Everyone's going to have their own definition of which ones
should appears in which of the new JLPT levels. In the long run it would
probably be more convenient for people to use the old levels as a
reference point since those levels were better defined by the Japan
Foundation/Japan Educational Exchanges and Services.
-Matthew

--- In edict-jmdict-hHKSG33TihhbjbujkaE4pw&amp;lt; at &amp;gt;public.gmane.org, Jim Breen  wrote:
intermediate
solution
entirely.
that no
about why they
University

&lt;/pre&gt;</description>
    <dc:creator>iammatthewmiller</dc:creator>
    <dc:date>2013-04-19T13:37:59</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/527">
    <title>Kanjidic and JLPT kanji levels</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/527</link>
    <description>&lt;pre&gt;As some of you will know, the JLPT has changed from having 4 levels
of tests to 5 levels. The actual change involved inserting a new intermediate
level below level 2, and renumbering the lower levels; 3-&amp;gt;4 and 4-&amp;gt;5.

At the same time, they announced they were no longer going to publish
lists of kanji and vocab for each level - "Therefore, we decided that
publishing
"Test Content Specifications" containing a list of vocabulary, kanji
and grammar
items was not necessarily appropriate."
See: http://www.jlpt.jp/e/faq/index.html#anchor28

At present in Kanjidic I have tags assigning kanji to the old levels
1-4. It would
be easy to amend the levels 3 and 4 tagging, but there is no clean solution
for splitting the 739 kanji which were listed for level 2. Some sites
are offering
lists for the 5 levels, but it seems the people running the sites are
making their
own split. Nothing authoritative is available.

Since a lot of apps, etc. draw on kanjidic, I think it's important not
to put out
misleading information. I'm &lt;/pre&gt;</description>
    <dc:creator>Jim Breen</dc:creator>
    <dc:date>2013-04-19T08:05:16</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/526">
    <title>Quiet for two weeks</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/526</link>
    <description>&lt;pre&gt;Hi,

I won't be able to do much editorial stuff for the next two weeks. My
wife's conference in Matsumoto (during which I had a bit of spare time)
finished yesterday, and we're off this morning to start some tripping
down the old villages of the Nakasendou, then on to Mie-ken, with some days
in Hong Kong next week.

Thanks to everyone who has been working away at things. It's been
interesting seeing how my new Android tablet has worked out doing
edits. It's actually been fine - the main frustrations have been with
cutting/pasting which is a far cry from the mouse-intensive X-Windows
style I am used to with Unix/Linux. The other frustration is that although
the tablet has a keyboard, which makes heavy-duty keying much easier(*),
the Japanese IME I installed (Simeji), only works on-screen,
so I have to separate the keyboard to type Japanese.

The other thing I don't have is my huge battery of hacked EB glossaries,
dictionaries, etc. They are text files, and although I have console-like
functions with Android, &lt;/pre&gt;</description>
    <dc:creator>Jim Breen</dc:creator>
    <dc:date>2013-04-01T00:19:32</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/525">
    <title>Re: Duplicated gairaigo</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/525</link>
    <description>&lt;pre&gt;

I strongly recommend installing Perl in a user directory. Download the
source code from perl.org, untarball perl-5.whatever and then run the
"Configure" script to have it installed in a local directory. You just need
to answer the I think one question the script asks about where to install
Perl and it is good to go. I use a directory /home/ben/software/install/ as
a substitute for /usr/local/. It means I don't need to use the root account
to install modules, and I don't have to worry about tangling with the
"system perl".
&lt;/pre&gt;</description>
    <dc:creator>Ben Bullock</dc:creator>
    <dc:date>2013-03-27T08:47:06</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/524">
    <title>Re: Duplicated gairaigo</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/524</link>
    <description>&lt;pre&gt;
Thanks for that. I'll look at it in detail when I'm back in Oz.
I've not had a lot of success running Perl modules using cpan,
but I think the environment on arakawa.edrdg.org is OK. Time
will tell.

Jim
&lt;/pre&gt;</description>
    <dc:creator>Jim Breen</dc:creator>
    <dc:date>2013-03-27T08:21:20</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/523">
    <title>Re: Duplicated gairaigo</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/523</link>
    <description>&lt;pre&gt;I've now put all the bits and pieces of this system for finding
duplicate gairaigo online:

https://github.com/benkasminbullock/edict-dups

The instructions for running the program are in the file README. It
can be downloaded at

https://github.com/benkasminbullock/edict-dups/tags

If you don't want to use "git" to download it.
&lt;/pre&gt;</description>
    <dc:creator>Ben Bullock</dc:creator>
    <dc:date>2013-03-25T13:08:04</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/522">
    <title>Re: Duplicated gairaigo</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/522</link>
    <description>&lt;pre&gt;
I have turned Ben's list into a WWW page with links to the
database edits for each entry,

http://www.csse.monash.edu.au/~jwb/fixgairaigo.html

I removed the ones that have been merged already, plus
the ones I thought were not the same term.

Feel free to do some merging/deleting (I'm not sure I'll be able to
prune the file while I'm away. I'm travelling with a tablet, and although
I have a login client installed, I don't think I can edit text files with
Japanese in them.)

Cheers

Jim



&lt;/pre&gt;</description>
    <dc:creator>Jim Breen</dc:creator>
    <dc:date>2013-03-23T07:23:07</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/521">
    <title>Re: Duplicated gairaigo</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/521</link>
    <description>&lt;pre&gt;

I hand-edited it to remove many of them. I only noticed there were a
lot left after I sent the mail.
&lt;/pre&gt;</description>
    <dc:creator>Ben Bullock</dc:creator>
    <dc:date>2013-03-23T02:00:06</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/520">
    <title>Re: Duplicated gairaigo</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/520</link>
    <description>&lt;pre&gt;
Nice list. Very useful, thanks.

A few false-positives, e.g. カタール/カタル  タイガー/タイガ,
but otherwise excellent.

I'm running out of time, but would be good to sort these out.

Jim

&lt;/pre&gt;</description>
    <dc:creator>Jim Breen</dc:creator>
    <dc:date>2013-03-22T23:38:34</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/519">
    <title>Re: Duplicated gairaigo</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/519</link>
    <description>&lt;pre&gt;As a test of a computer program I have been working on, I made a similar list:

https://gist.github.com/benkasminbullock/5221020

This has about 200 entries, noticeably from {comp}.
&lt;/pre&gt;</description>
    <dc:creator>Ben Bullock</dc:creator>
    <dc:date>2013-03-22T13:00:12</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/518">
    <title>Duplicated gairaigo</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/518</link>
    <description>&lt;pre&gt;Greetings,

I was poking around among some 外来語 entries yesterday and noticed
quite a few were effectively duplicated, in that versions existed with and
without a trailing "ー", e.g.  ゼネラルコントラクター and ゼネラルコントラクタ.
I put in corrections for a couple.

I just ran a simple script to track them down and found that there are over
100 such duplicates, and even more when you include embedded cases
such as  アイデンティティクライシス and アイデンティティークライシス.

Add in the cases where イ is used instead of ー, and you get even more, e.g.
there are distinct entries for クリエータ, クリエーター and クリエイター.

Fixing these means two database changes each - expand one entry and
delete the other. I intend to chip away at them over coming days, and rather
than deluge the amendment process with relatively trivial changes, I propose
to self-approve them. The changes will be visible, e.g via summaries such as
http://www.edrdg&lt;/pre&gt;</description>
    <dc:creator>Jim Breen</dc:creator>
    <dc:date>2013-03-21T22:41:34</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/517">
    <title>Re: The use of the pipe (|) character in Tatoeba's wwwjdic.csv file</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/517</link>
    <description>&lt;pre&gt;

It's an artefact from the days when Paul Blay maintained the sentence
and indices using an
MS-Access database. His system had some problems disambiguating things
like は and ハ, so
he added that "|n" trailer to make them different. I delete them when
I make the WWW index
file from the "wwwjdic.csv", which is why I didn't mention them on
that Wiki page.

I have added a brief explanation on that page now.

I'd delete them all using the edit GUI, but it has a limit on the
number it can edit in one go,
and there are too many of these "|" for it to handle.

HTH

Jim

--
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
&lt;/pre&gt;</description>
    <dc:creator>Jim Breen</dc:creator>
    <dc:date>2013-03-15T22:47:54</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/516">
    <title>Re: The use of the pipe (|) character in Tatoeba's  wwwjdic.csv file</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/516</link>
    <description>&lt;pre&gt;Le 2013-03-15 17:06, iammatthewmiller a écrit :

Hi,

"&amp;lt;some headword&amp;gt;|&amp;lt;an integer&amp;gt;" is used when a headword appears in more 
than an entry.
In that case, the integer indicates which entry is refered (1 means the 
entry with the lowest seq_id, 2 means the second lowest ...)

JL


&lt;/pre&gt;</description>
    <dc:creator>Jean-Luc Léger</dc:creator>
    <dc:date>2013-03-15T22:05:16</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/515">
    <title>The use of the pipe (|) character in Tatoeba's wwwjdic.csv file</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/515</link>
    <description>&lt;pre&gt;I'm working on splitting up the Tatoeba example sentences CSV file into
a few SQL databases. In many of the sentence indices, I see the use of
either "ã¯|1" or "&amp;lt;some word&amp;gt;|2" to indicate...something...about
that particle or word. The page on EDRDG.org that describes the format
of each entry
(http://www.edrdg.org/wiki/index.php/Sentence-Dictionary_Linking
&amp;lt;http://www.edrdg.org/wiki/index.php/Sentence-Dictionary_Linking&amp;gt;  )
lists the meaning of all of the special characters except for the
pipe-followed-by-a-single-digit. What is the significance of the number
after the pipe, and why is it only either 1 or 2?
Thanks,-Matt
&lt;/pre&gt;</description>
    <dc:creator>iammatthewmiller</dc:creator>
    <dc:date>2013-03-15T16:06:32</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/514">
    <title>Gairaigo segmenter and translator</title>
    <link>http://permalink.gmane.org/gmane.science.linguistics.edict-jmdict/514</link>
    <description>&lt;pre&gt;Greetings,

One of my research projects has involved techniques for splitting
long gairaigo (usually compound nouns) into their components and
generating translations for them. I aim to use the tools I have developed
to go over the gairaigo in JMdict and add the nakaguro versions as
appropriate. Longer term, I aim to build them into a system for trawling
the WWW and other corpora looking for undocumented gairaigo.

I have put a couple of tools online as a WWW server, and people are
welcome to try them out (and give feedback). One is a system I built last
year, and the other uses the MeCab morphological analyzer with the most
recent Unidic lexicon. The MeCab/Unidic combination does a remarkably
good job, and shows how much such systems have moved on from treating
long katakana strings as single morphemes.

The URL is: http://hum.csse.unimelb.edu.au/~jwb/gairaigo.html

Cheers

Jim

&lt;/pre&gt;</description>
    <dc:creator>Jim Breen</dc:creator>
    <dc:date>2013-03-07T00:31:37</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.science.linguistics.edict-jmdict">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.science.linguistics.edict-jmdict</link>
  </textinput>
</rdf:RDF>
