[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Regarding Outreachy project on Improving CR Dashboard



On Tue, 2016-04-05 at 22:05 +0530, Priya wrote:
> Hello all, 
> 
> I have completed coding the initial task of grouping the email thread
> using the Zawinski algorithms and then adding property entity to the
> json for the messages that belong to the same email thread. 
> 
> You can see my git repo [1]. The new.json is the output of my script
> and out.json is the output of Perceval. 
> 
> Also, I have updated the README.md file regarding the execution
> procedures in github.
> 
> Instructions
> ============
> 
> git clone https://github.com/priya299/Dashboard.git
> 
> cd Dashboard
> 
> python createjson.py 'Perceval Ouputfile' 'mbox file' 'output_file'
> 
> eg: python createjson.py out.json xen-devel-2016-03 new.json
> 
> "new.json" json file will be created with each message belong to a
> single thread having an additional attribute "property". The property
> attribute will have message id of the first message in the thread.
> 
> Now, I will be pushing the new.json into the elastic search db[2].
> Please give me your valuable feedback about my progress. 
> 
> [1]:https://github.com/priya299/Dashboard
> [2]:https://www.elastic.co/guide/en/kibana/3.0/import-some-data.html

Hi, Priya. To begin with, could you please integrate your code with the
Perceval iterator? In other words, you can run Perceval on the mailing
list archive directly from your code, which will render the use of
"out.json" void. That way, the invocation of the script would be more
like:

python createjson.py xen-devel-2016-03 new.json

In other words, create.json would use Perceval to parse the mailing
list archive. For this end, the Perceval mbox backend is a class, which
once instantiated, provides an iterator function, fetch(), that you can
run inside a loop. For each iteration of the loop, you get the
equivalent to a JSON element in out.json.

The code would be similar to:

-------------------------------
import perceval

mbox_parser = perceval.backends.mbox.MBox(
  origin=mbox_url,
  dirpath=mbox_file_name
)
for item in mbox_parser.fetch():
  thread_id = find_thread(item)
  ...
---------------------------------

Some details about the Perceval mbox class:

http://perceval.readthedocs.org/en/master/perceval.backends.html#module
-perceval.backends.mbox

If you have trouble running the Perceval backend as an iterator, please
let me know.

In addition, you can use argparse for reading the arguments in the
command line. It is easy and convenient.

Saludos,

        Jesus.

> 
-- 
Bitergia: http://bitergia.com
/me at Twitter: https://twitter.com/jgbarah


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.