[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Outreachy project - Xen Code Review Dashboard



Hi Jesus,

I have a version of the task running, I'd love if you could take a
look and let me know if there are any changes you'd like to see.


It gets the mailboxes, analyzes them using Perceval and an
implementation of the well known jwz's threading algorithm
in Elasticsearch.
Each document in ES is a message, with its id being the
Message-ID and type being a modified Subject line from the
first message in a thread.

I hope this is what was intended for the task!

PS - Should I continue copying these messages to the
whole xen-devel mailing list, or is sending them to you
sufficient?

Thanks!

Heather

On Mon, Apr 17, 2017 at 2:04 AM, Jesus M. Gonzalez-Barahona <jgb@xxxxxxxxxxxx> wrote:
On Sun, 2017-04-16 at 21:26 -0700, Heather Booker wrote:
> Hi Jesus!
>
> I appreciate the info on the unicode error. I might have missed it,
> but I also asked about the general microtask specifications. Here
> was my original inquiry:
> > And to clarify, my understanding is that the final result of
> this task
> > is an index of Xen data, with two types: commits and messages.
> > Each commit document should contain its original information
> > from git, plus the name of the branch it was developed in. And
> > should only the mbox messages which appear to be associated
> > with a specific commit exist in the final index? Is there some
> > key information in messages that is supposed to indicate the
> > association of a given commit with a git branch? I would be
> > grateful if you could specify the end goal a little more. :D
>
> Yeah, so overall I'm not sure I understand the relationship of
> branches to the mailing list messages. Is this to be a simple
> string parsing task wherein I should scan the message body
> for the word "branch"? (I am guessing not ;P)

I'm sorry, I understood that text was about the project, not about the
microtask. The microtask is about either:

* Producing an ES index with messages labeled by thread (by applying a
threading algorithm to messages retrieved from archives), or

* Producing an ES index with commits labeled by branch (by following
refes, and parents information in the output produced by Perceval).

In the complete project, both will be used to produce the final indexes
that power the code review dashboard.

> I will be happy to get back on developing once I better grasp
> the goal! :)

More clear now?

If you want, let's schedule some IRC slot for clarifying whatever is
not clear.

        Jesus.

> Thanks!
>
> Heather
>
> On Sun, Apr 16, 2017 at 4:23 PM, Jesus M. Gonzalez-Barahona <jgb@bite
> rgia.com> wrote:
> > On Thu, 2017-04-13 at 00:47 -0700, Heather Booker wrote:
> > > Hi,
> > >
> > > I submitted an application for this code review dashboard and
> > > would love to keep working on the microtask once I get some
> > > more info. :)
> >
> > Great! I answered your message, could you progress with the task?
> >
> > > I also came up with a general idea of how the project might be
> > > split up - any feedback on this would be welcome! I wrote:
> > >
> > > "As said by Jesus, the big picture of this project will be
> > porting
> > > everything behind the current code review dashboard to use
> > > Grimoire Lab tools, from the current state of using
> > > MetricsGrimoire and custom scripts. I expect this would involve
> > > Perceval for analyzing data, and Grimoire Elk may be useful in
> > > further stages, or may be too general - this is something I would
> > > wish to explore.
> > > This project will also involve a migration from SQL to
> > Elasticsearch
> > > - because I believe the relevant data is mostly / all available
> > in
> > > places online, I am unsure whether this would need to be a direct
> > > migration. However, looking at the current SQL setup would be
> > > beneficial to understanding the desired format of the
> > Elasticsearch
> > > indexes.
> > > I would love to dive into this project and have 3 main parts -
> > > getting
> > > data into ES, turning it into dashboard displays, and then fine
> > > tuning
> > > and perhaps augmenting the dashboard to improve its usefulness.
> > > Getting data into ES may seem simple but I believe that once it
> > > needs to be used for the dashboard, many realizations will pop up
> > > - thus I’d like to leave maybe 2-3 weeks for that first step, 6-7
> > > weeks
> > > for the visualizations (which will include querying the data),
> > and
> > > the
> > > final 3 weeks for touch ups and improvements."
> >
> > The plan could be sound, but would need some tweaks, once your
> > skills
> > in Python are clear, which could be the main blocker for the first
> > stages.
> >
> > > Does this sound like an accurate summary and reasonable
> > timeline? 
> > > And I am guessing that from Jesus's involvement with the threads
> > > that Jesus would be the mentor, is that correct? :)
> >
> > Yes, I would be ;-)
> >
> >         Jesus.
> >
> > > Thanks!
> > >
> > > Heather
> > >
> > >
> > > On Sun, Apr 9, 2017 at 9:50 PM, Heather Booker <heather.j.booker@
> > gmai
> > > l.com> wrote:
> > > > Hi Jesus,
> > > >
> > > > While using the Elasticsearch python library
> > > > (https://elasticsearch-py.readthedocs.io/en/master/) to add
> > mbox
> > > > messages to an index, I would get a UnicodeEncodeError:
> > > > "'utf-8' codec can't encode character '\udca0' in position 767:
> > > > surrogates not allowed".
> > > >
> > > > Investigating in Grimoire elk https://github.com/grim
> > > > oirelab/GrimoireELK/blob/96b00bc682485976104a6825ca63ae0
> > > > 8639deacc/grimoire_elk/elk/mbox.py#L200 seems to show that 
> > > > perhaps that tool instead uses Latin-1 encoding, but I found
> > that
> > > > to then produce a serialization error (their custom error
> > message:
> > > > "Unable to serialize %r (type: %s)"). I suppose this is because
> > > > now it's bytes; of course, converting back to string after
> > encoding
> > > > just cycles back to the first error.
> > > >
> > > > As somewhat of a Python newbie I don't really know how to
> > tackle
> > > > this! My thought atm is to splice the offending character out
> > > > of the message. 
> > > >
> > > > And to clarify, my understanding is that the final result of
> > this
> > > > task
> > > > is an index of Xen data, with two types: commits and messages.
> > > > Each commit document should contain its original information
> > > > from git, plus the name of the branch it was developed in. And
> > > > should only the mbox messages which appear to be associated
> > > > with a specific commit exist in the final index? Is there some
> > > > key information in messages that is supposed to indicate the
> > > > association of a given commit with a git branch? I would be
> > > > grateful if you could specify the end goal a little more. :D
> > > >
> > > > Thanks so much!
> > > >
> > > > Heather
> > > >
> > > >
> > > >
> > > > On Sat, Apr 8, 2017 at 10:02 AM, Jesus M. Gonzalez-Barahona <jg
> > b@bi
> > > > tergia.com> wrote:
> > > > > On Fri, 2017-04-07 at 15:49 -0700, Heather Booker wrote:
> > > > > > Hi Jesus, 
> > > > > >
> > > > > > Thanks for your reply!
> > > > > >
> > > > > > So about the task, instructions say after analyzing mboxes
> > with
> > > > > > Perceval to
> > > > > > "store the resulting raw index in ElasticSearch" - what
> > does
> > > > > raw
> > > > > > index mean?
> > > > >
> > > > > In this context, I mean "storing the JSON documents produced
> > by
> > > > > Perceval in an ElasticSearch index, as such". ElasticSearch
> > > > > stores JSON
> > > > > documents, so it is just uploading the output of Perceval to
> > it.
> > > > >
> > > > > > In terms of figuring out the elasticsearch structure, do I
> > want
> > > > > an
> > > > > > index
> > > > > > (xen-devel mbox) with a type (message) and each object from
> > the
> > > > > > perceval
> > > > > > output to be one document? Or should it be more fine-
> > grained?
> > > > >
> > > > > Exactly.
> > > > >
> > > > > Saludos,
> > > > >
> > > > >         Jesus.
> > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Heather
> > > > > >
> > > > > > On Thu, Apr 6, 2017 at 7:05 AM, Jesus M. Gonzalez-Barahona
> > <jgb
> > > > > @biter
> > > > > > gia.com> wrote:
> > > > > > > On Wed, 2017-04-05 at 16:43 -0700, Heather Booker wrote:
> > > > > > > > Hi!
> > > > > > > >
> > > > > > > > I'd love to work on the Code Review Dashboard project
> > for
> > > > > this
> > > > > > > round
> > > > > > > > of Outreachy.
> > > > > > >
> > > > > > > Great!!
> > > > > > >
> > > > > > > > Are the steps outlined
> > > > > > > > here http://markmail.org/message/7adkmords3imkswd still
> > the
> > > > > first
> > > > > > > > contribution you'd like to see?
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > > So is this a project that has been worked on in
> > previous
> > > > > rounds
> > > > > > > of
> > > > > > > > GSOC/Outreachy also?
> > > > > > > > If so is there a place to find links to the previous
> > > > > participants
> > > > > > > > blogs? :)
> > > > > > >
> > > > > > > No. We had one participation at some point, but couldn't
> > even
> > > > > start
> > > > > > > for
> > > > > > > personal reasons. There are some people considering
> > working
> > > > > on this
> > > > > > > for
> > > > > > > this next round of Outreachy, however. You'll see their
> > > > > messages in
> > > > > > > this mailing list.
> > > > > > >
> > > > > > > > Should questions about how the
> > specifications/completion of
> > > > > the
> > > > > > > > microtask be addressed to
> > > > > > > > IRC or this list? If IRC, which channel - #xen-opw or
> > > > > #metrics-
> > > > > > > > grimoire? On that note, I'm 
> > > > > > > > curious why #metrics-grimoire is the listed channel on
> > the
> > > > > > > project
> > > > > > > > page - are main contributors
> > > > > > > > involved in both projects? Or is it just because the
> > Xen
> > > > > > > dashboard
> > > > > > > > doesn't have a channel?
> > > > > > >
> > > > > > > The code review is for the Xen project, but it is done
> > with
> > > > > (I
> > > > > > > mean,
> > > > > > > the ssoftware used for it is) GrimoireLab, which for
> > > > > historical
> > > > > > > reasons
> > > > > > > uses the #metrics-grimoire channel. That's why it is
> > likely
> > > > > that
> > > > > > > you
> > > > > > > find somebody from the project there.
> > > > > > >
> > > > > > > If you have questions, and find me around in IRC, please
> > ping
> > > > > me.
> > > > > > > If
> > > > > > > I'm not available, please send an email message.
> > > > > > >
> > > > > > > Saludos,
> > > > > > >
> > > > > > >         Jesus.
> > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > Heather
> > > > > > > > _______________________________________________
> > > > > > > > Xen-devel mailing list
> > > > > > > > Xen-devel@xxxxxxxxxxxxx
> > > > > > > > https://lists.xen.org/xen-devel
> > > > > > > --
> > > > > > > Bitergia: http://bitergia.com
> > > > > > > /me at Twitter: https://twitter.com/jgbarah
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > Xen-devel mailing list
> > > > > > Xen-devel@xxxxxxxxxxxxx
> > > > > > https://lists.xen.org/xen-devel
> > > > > --
> > > > > Bitergia: http://bitergia.com
> > > > > /me at Twitter: https://twitter.com/jgbarah
> > > > >
> > > > >
> > > >
> > > >
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@xxxxxxxxxxxxx
> > > https://lists.xen.org/xen-devel
> > --
> > Bitergia: http://bitergia.com
> > /me at Twitter: https://twitter.com/jgbarah
> >
> >
>
>
--
Bitergia: http://bitergia.com
/me at Twitter: https://twitter.com/jgbarah


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.