Export to GitHub

qizmt - issue #8

AELight has stopped working after issuing "qizmt @format Machines=localhost"


Posted on May 12, 2010 by Quick Rabbit

After successfully installing Qizmt, open a window with cmd, then issued the command

qizmt @format Machines=localhost

got error message as

AELight has stopped working A problem caused the program to stop working correctly, Windows will close the program and notify you if a solution is available.

What steps will reproduce the problem? 1. 2. 3.

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system? qizmt1.2 windows 7

Please provide any additional information below.

Comment #1

Posted on May 12, 2010 by Quick Rabbit

more Error message:

Unhandled Exception: System.UnauthorizedAccessException: Access to the path 'C:\ Program Files (x86)\MySpace.DataMining\Qizmt\jid.dat' is denied. at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, I nt32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions o ptions, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy) at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy) at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options) at System.IO.StreamWriter..ctor(String path, Boolean append, Encoding encodin g, Int32 bufferSize) at System.IO.StreamWriter..ctor(String path, Boolean append, Encoding encodin g) at System.IO.File.WriteAllText(String path, String contents, Encoding encodin g) at MySpace.DataMining.AELight.AELight.AELightRun(String[] args) in C:\SimpleS olutions\DataMining\DistributedObjects5\AELight\AELight.cs:line 1878 at MySpace.DataMining.AELight.AELight.Main(String[] args) in C:\SimpleSolutio ns\DataMining\DistributedObjects5\AELight\AELight.cs:line 1644

Job aborted abruptly; to clean up intermediate data and processes, issue command : Qizmt kill 3

Comment #2

Posted on May 12, 2010 by Quick Rabbit

By the way, is the login name as

My Login Name

that is, with space, allowed?

Comment #3

Posted on Jun 24, 2010 by Grumpy Camel

the account used to install needs read/write access to \\$\\

http://code.google.com/p/qizmt/wiki/MySpaceQizmtFAQInstallation

Comment #4

Posted on Jun 26, 2010 by Quick Rabbit

I solved the problem by installing Qizmt in another directory other than the default one. However, after installing Qizmt in two Machines, say shark01 shark02, I can format them by doing Qizmt @format Machines=shark01,shark02 and then run Qizmt examples but when I run the example Qizmt exec Qizmt-WordCount.xml I got the error message as below. Dan, could you kindly let me know how to solve it? Thank you very much in advance! By the way, both machines are running Windows 7. C:>Qizmt exec Qizmt-WordCount.xml Job Identifier: 58 [6/25/2010 6:53:34 PM] [Local: PrepJob] * [6/25/2010 6:53:38 PM] Done Duration: 00:00:03

[6/25/2010 6:53:38 PM] [Remote: wordCount_LoadData] 1 processes on 2 machines: *

[6/25/2010 6:53:39 PM] Done Output: dfs://WordCount_Input.txt Duration: 00:00:02

[6/25/2010 6:53:39 PM] [MapReduce: WordCount] Legend: m = map done; e = exchange done; s = sort done; r = reduce done 11 processes on 2 machines: mmmmmm.. Unable to connect to DistributedObjects service on shark04: Thread exceptionSystem.Exception: Error in Open: System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond a fter a period of time, or established connection failed because connected host h as failed to respond 192.168.0.2:55900 at System.Net.Sockets.Socket.Connect(IPAddress[] addresses, Int32 port) at System.Net.Sockets.Socket.Connect(String host, Int32 port) at MySpace.DataMining.DistributedObjects5.DistObject.Open() [Note: ensure th e Windows service is running] at MySpace.DataMining.DistributedObjects5.DistObject.Open() at MySpace.DataMining.DistributedObjects5.ArrayComboList.Open() at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.ensureopen() in C:\S impleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 702 at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.firstthreadproc() in C:\SimpleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 732 Unable to connect to DistributedObjects service on shark04: Thread exceptio nSystem.Exception: Error in Open: System.Net.Sockets.SocketException: A connecti on attempt failed because the connected party did not properly respond after a p eriod of time, or established connection failed because connected host has faile d to respond 192.168.0.2:55900 at System.Net.Sockets.Socket.Connect(IPAddress[] addresses, Int32 port) at System.Net.Sockets.Socket.Connect(String host, Int32 port) at MySpace.DataMining.DistributedObjects5.DistObject.Open() [Note: ensure th e Windows service is running] at MySpace.DataMining.DistributedObjects5.DistObject.Open() at MySpace.DataMining.DistributedObjects5.ArrayComboList.Open() at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.ensureopen() in C:\S impleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 702 at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.firstthreadproc() in C:\SimpleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 732 Unable to connect to DistributedObjects service on shark04: Thread exceptio nSystem.Exception: Error in Open: System.Net.Sockets.SocketException: A connecti on attempt failed because the connected party did not properly respond after a p eriod of time, or established connection failed because connected host has faile d to respond 192.168.0.2:55900 at System.Net.Sockets.Socket.Connect(IPAddress[] addresses, Int32 port) at System.Net.Sockets.Socket.Connect(String host, Int32 port) at MySpace.DataMining.DistributedObjects5.DistObject.Open() [Note: ensure th e Windows service is running] at MySpace.DataMining.DistributedObjects5.DistObject.Open() at MySpace.DataMining.DistributedObjects5.ArrayComboList.Open() at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.ensureopen() in C:\S impleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 702 at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.firstthreadproc() in C:\SimpleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 732 Unable to connect to DistributedObjects service on shark04: Thread exceptio nSystem.Exception: Error in Open: System.Net.Sockets.SocketException: A connecti on attempt failed because the connected party did not properly respond after a p eriod of time, or established connection failed because connected host has faile d to respond 192.168.0.2:55900 at System.Net.Sockets.Socket.Connect(IPAddress[] addresses, Int32 port) at System.Net.Sockets.Socket.Connect(String host, Int32 port) at MySpace.DataMining.DistributedObjects5.DistObject.Open() [Note: ensure th e Windows service is running] at MySpace.DataMining.DistributedObjects5.DistObject.Open() at MySpace.DataMining.DistributedObjects5.ArrayComboList.Open() at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.ensureopen() in C:\S impleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 702 at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.firstthreadproc() in C:\SimpleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 732 Unable to connect to DistributedObjects service on shark04: Thread exceptio nSystem.Exception: Error in Open: System.Net.Sockets.SocketException: A connecti on attempt failed because the connected party did not properly respond after a p eriod of time, or established connection failed because connected host has faile d to respond 192.168.0.2:55900 at System.Net.Sockets.Socket.Connect(IPAddress[] addresses, Int32 port) at System.Net.Sockets.Socket.Connect(String host, Int32 port) at MySpace.DataMining.DistributedObjects5.DistObject.Open() [Note: ensure th e Windows service is running] at MySpace.DataMining.DistributedObjects5.DistObject.Open() at MySpace.DataMining.DistributedObjects5.ArrayComboList.Open() at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.ensureopen() in C:\S impleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 702 at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.firstthreadproc() in C:\SimpleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 732

[6/25/2010 6:54:01 PM]        Map done; starting map exchange

.................................................

Comment #5

Posted on Jun 26, 2010 by Grumpy Camel

Thanks for the workaround on the install directory! We will look into having a better default for this or other solution.

As far as the error you are getting from running the built-in word count job:

from within your private LAN, try at command line:

telnet 55900

if it fails, make sure there are no ports over 1000 blocked between the servers of your cluster only.

Also note, Qizmt should not be installed on servers which can be accessed from the internet. It is for private LAN only.

Comment #6

Posted on Jun 29, 2010 by Quick Rabbit

Hi, Dan, I tried telnet 55900 at command line, it works OK. Also, we installed Qizmt in Windows 7 professional edition. Could you kindly look into the problem, and help us to solve it? Thanks in advance!

Comment #7

Posted on Jun 30, 2010 by Happy Wombat

You mention formatting with hosts shark01 and shark02, but the error message is about host shark04. Be sure that you are not calling a previous install of Qizmt and that you formatted with the correct hosts. Also, ensure that Qizmt is installed in the same local directory on all hosts, e.g. on host1 c:\Qizmt\ and on host2 c:\Qizmt\

Comment #8

Posted on Jul 2, 2010 by Quick Rabbit

Hi, Chris, sorry for making you confused. I do formated shark04, in the message there, I wanted to tell you that, the format is successful, but it still does not work. By the way, I do installed in the same directory in all the hosts, and the OS is windows 7 professional edition. Any idea about the solution? Thanks again for your taking time to look into the problem! Happy holiday!

Comment #9

Posted on Jul 14, 2010 by Quick Rabbit

Today, when I test, got the following error message Could you let me know how to solve the problem? Thanks in advance!

C:>qizmt exec For_Peter.xml Job Identifier: 378 [7/13/2010 6:37:30 PM] [MapReduce: Job Processing: get unique lower level category - upper level category ID pairs] Legend: m = map done; e = exchange done; s = sort done; r = reduce done 4 processes on 1 machines: (1000 * max((1769321 / 12) / (10737418240 / 12), 1)) = 1000 ffff [7/13/2010 6:37:34 PM] Distribution index done; starting map mmmm [7/13/2010 6:37:38 PM] Map done; starting map exchange eeee[exchange completed 00:00:01].ssss[sort completed 00:00:11]rrrr

[7/13/2010 6:37:43 PM] Done Output: dfs://job_01_output.txt Duration: 00:00:13

[7/13/2010 6:37:43 PM] [MapReduce: Job Processing: get upper level catego ry ID - keyword pair ] Legend: m = map done; e = exchange done; s = sort done; r = reduce done 5 processes on 1 machines: ................................................................................ .m.mm.m....m [7/13/2010 6:52:18 PM] Map done; starting map exchange ..........................ee..e.e....s.s...s.s.......r........r...r..r.......... ............eexchange completed 00:13:25.Thread exception: (exchange th read) System.Exception: SortBlocks error: Sub process 0 did not return a valid r esponse at MySpace.DataMining.DistributedObjects5.ArrayComboList.SortBlocks() at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.InZBlocks() in C:\Si mpleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 133 at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.exchangethreadproc() in C:\SimpleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:l ine 958

Split count: 1 [7/13/2010 7:05:41 PM 802ms] \Shark01 DistributedObjectsSlave error: (build:368 6.29033) Problem loading remote zMapBlock '\localhost\C$\Users\MapReduce\MapRed uce\zmap_0_a60bbad1-6e8b-4e21-a896-9abaf0ffc8fa.j378.zm': System.Exception: Insu fficient resources for this job on cluster (ZBlock value file size > ZVALUEBLOCK _LIMIT) (consider increasing sub process count) at MySpace.DataMining.DistributedObjects5.ArrayComboListPart.ZBlock.Add(Byte[ ] keybuf, Int32 keyoffset, Byte[] valuebuf, Int32 valueoffset, Int32 valuelength ) in C:\SimpleSolutions\DataMining\DistributedObjects5\MySpace.DataMining.Distri butedObjects.DistributedObjectsSlave\ArrayComboListSlave.cs:line 340 at MySpace.DataMining.DistributedObjects5.ArrayComboListPart.ZMapStreamToZBlo cks(Stream stm, Int64 len, String sfn, Int32 iFILE_BUFFER_SIZE, Boolean bcompres szmaps) in C:\SimpleSolutions\DataMining\DistributedObjects5\MySpace.DataMining. DistributedObjects.DistributedObjectsSlave\ArrayComboListSlave.cs:line 5378 at MySpace.DataMining.DistributedObjects5.ArrayComboListPart.ProcessCommand(N etworkStream nstm, Char tag) in C:\SimpleSolutions\DataMining\DistributedObjects 5\MySpace.DataMining.DistributedObjects.DistributedObjectsSlave\ArrayComboListSl

ave.cs:line 5129

System.NullReferenceException: Object reference not set to an instance of an obj ect. at MySpace.DataMining.AELight.AELight._ExecOneMapReduce(String ExecOpts, Job cfgj, String[] ExecArgs, Boolean verbose, Boolean verbosereplication, List1 Add CacheNodes, List1 AddCacheDfsFileNames, List1 AddCacheNodesOffsets, List1 Add CacheNodesRecLengths) in C:\SimpleSolutions\DataMining\DistributedObjects5\AELig ht\ExecMapReduce.cs:line 2477 at MySpace.DataMining.AELight.AELight.ExecOneMapReduce(String ExecOpts, Job c fgj, String[] ExecArgs, Boolean verbose, Boolean verbosereplication) in C:\Simpl eSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 1473 at MySpace.DataMining.AELight.AELight.Exec(String ExecOpts, SourceCode cfg, S tring[] ExecArgs, Boolean verbose, Boolean verbosereplication) in C:\SimpleSolut ions\DataMining\DistributedObjects5\AELight\AELight.cs:line 421 at MySpace.DataMining.AELight.AELight.Exec(String ExecOpts, SourceCode cfg, S tring[] ExecArgs, Boolean verbose) in C:\SimpleSolutions\DataMining\DistributedO bjects5\AELight\AELight.cs:line 471 at MySpace.DataMining.AELight.AELight.AELightRun(String[] args) in C:\SimpleS olutions\DataMining\DistributedObjects5\AELight\AELight.cs:line 3026

C:>

Comment #10

Posted on Jul 20, 2010 by Happy Wombat

The relevant part of this error message is "System.Exception: Insufficient resources for this job on cluster (ZBlock value file size > ZVALUEBLOCK_LIMIT) (consider increasing sub process count)" and it means too much data is going to one intermediate data file.

This is addressed in the FAQ at http://code.google.com/p/qizmt/wiki/MySpaceQizmtFAQTroubleshoot#System.Exception:Insufficient_resources_for_this_job_on_cluster

One solution is to use the IntermediateDataAddressing tag in your job as explained at http://code.google.com/p/qizmt/wiki/MySpaceQizmtReferenceIntermediateDataAddressing this will simply allow more data to go into one intermediate data file.

Comment #11

Posted on Dec 21, 2010 by Grumpy Camel

(No comment was entered for this change.)

Status: Duplicate

Labels:
Type-Defect Priority-Medium