
qizmt - issue #8
AELight has stopped working after issuing "qizmt @format Machines=localhost"
After successfully installing Qizmt, open a window with cmd, then issued the command
qizmt @format Machines=localhost
got error message as
AELight has stopped working A problem caused the program to stop working correctly, Windows will close the program and notify you if a solution is available.
What steps will reproduce the problem? 1. 2. 3.
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system? qizmt1.2 windows 7
Please provide any additional information below.
Comment #1
Posted on May 12, 2010 by Quick Rabbitmore Error message:
Unhandled Exception: System.UnauthorizedAccessException: Access to the path 'C:\ Program Files (x86)\MySpace.DataMining\Qizmt\jid.dat' is denied. at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, I nt32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions o ptions, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy) at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy) at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options) at System.IO.StreamWriter..ctor(String path, Boolean append, Encoding encodin g, Int32 bufferSize) at System.IO.StreamWriter..ctor(String path, Boolean append, Encoding encodin g) at System.IO.File.WriteAllText(String path, String contents, Encoding encodin g) at MySpace.DataMining.AELight.AELight.AELightRun(String[] args) in C:\SimpleS olutions\DataMining\DistributedObjects5\AELight\AELight.cs:line 1878 at MySpace.DataMining.AELight.AELight.Main(String[] args) in C:\SimpleSolutio ns\DataMining\DistributedObjects5\AELight\AELight.cs:line 1644
Job aborted abruptly; to clean up intermediate data and processes, issue command : Qizmt kill 3
Comment #2
Posted on May 12, 2010 by Quick RabbitBy the way, is the login name as
My Login Name
that is, with space, allowed?
Comment #3
Posted on Jun 24, 2010 by Grumpy Camelthe account used to install needs read/write access to \\$\\
http://code.google.com/p/qizmt/wiki/MySpaceQizmtFAQInstallation
Comment #4
Posted on Jun 26, 2010 by Quick RabbitI solved the problem by installing Qizmt in another directory other than the default one. However, after installing Qizmt in two Machines, say shark01 shark02, I can format them by doing Qizmt @format Machines=shark01,shark02 and then run Qizmt examples but when I run the example Qizmt exec Qizmt-WordCount.xml I got the error message as below. Dan, could you kindly let me know how to solve it? Thank you very much in advance! By the way, both machines are running Windows 7. C:>Qizmt exec Qizmt-WordCount.xml Job Identifier: 58 [6/25/2010 6:53:34 PM] [Local: PrepJob] * [6/25/2010 6:53:38 PM] Done Duration: 00:00:03
[6/25/2010 6:53:38 PM] [Remote: wordCount_LoadData] 1 processes on 2 machines: *
[6/25/2010 6:53:39 PM] Done Output: dfs://WordCount_Input.txt Duration: 00:00:02
[6/25/2010 6:53:39 PM] [MapReduce: WordCount] Legend: m = map done; e = exchange done; s = sort done; r = reduce done 11 processes on 2 machines: mmmmmm.. Unable to connect to DistributedObjects service on shark04: Thread exceptionSystem.Exception: Error in Open: System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond a fter a period of time, or established connection failed because connected host h as failed to respond 192.168.0.2:55900 at System.Net.Sockets.Socket.Connect(IPAddress[] addresses, Int32 port) at System.Net.Sockets.Socket.Connect(String host, Int32 port) at MySpace.DataMining.DistributedObjects5.DistObject.Open() [Note: ensure th e Windows service is running] at MySpace.DataMining.DistributedObjects5.DistObject.Open() at MySpace.DataMining.DistributedObjects5.ArrayComboList.Open() at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.ensureopen() in C:\S impleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 702 at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.firstthreadproc() in C:\SimpleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 732 Unable to connect to DistributedObjects service on shark04: Thread exceptio nSystem.Exception: Error in Open: System.Net.Sockets.SocketException: A connecti on attempt failed because the connected party did not properly respond after a p eriod of time, or established connection failed because connected host has faile d to respond 192.168.0.2:55900 at System.Net.Sockets.Socket.Connect(IPAddress[] addresses, Int32 port) at System.Net.Sockets.Socket.Connect(String host, Int32 port) at MySpace.DataMining.DistributedObjects5.DistObject.Open() [Note: ensure th e Windows service is running] at MySpace.DataMining.DistributedObjects5.DistObject.Open() at MySpace.DataMining.DistributedObjects5.ArrayComboList.Open() at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.ensureopen() in C:\S impleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 702 at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.firstthreadproc() in C:\SimpleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 732 Unable to connect to DistributedObjects service on shark04: Thread exceptio nSystem.Exception: Error in Open: System.Net.Sockets.SocketException: A connecti on attempt failed because the connected party did not properly respond after a p eriod of time, or established connection failed because connected host has faile d to respond 192.168.0.2:55900 at System.Net.Sockets.Socket.Connect(IPAddress[] addresses, Int32 port) at System.Net.Sockets.Socket.Connect(String host, Int32 port) at MySpace.DataMining.DistributedObjects5.DistObject.Open() [Note: ensure th e Windows service is running] at MySpace.DataMining.DistributedObjects5.DistObject.Open() at MySpace.DataMining.DistributedObjects5.ArrayComboList.Open() at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.ensureopen() in C:\S impleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 702 at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.firstthreadproc() in C:\SimpleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 732 Unable to connect to DistributedObjects service on shark04: Thread exceptio nSystem.Exception: Error in Open: System.Net.Sockets.SocketException: A connecti on attempt failed because the connected party did not properly respond after a p eriod of time, or established connection failed because connected host has faile d to respond 192.168.0.2:55900 at System.Net.Sockets.Socket.Connect(IPAddress[] addresses, Int32 port) at System.Net.Sockets.Socket.Connect(String host, Int32 port) at MySpace.DataMining.DistributedObjects5.DistObject.Open() [Note: ensure th e Windows service is running] at MySpace.DataMining.DistributedObjects5.DistObject.Open() at MySpace.DataMining.DistributedObjects5.ArrayComboList.Open() at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.ensureopen() in C:\S impleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 702 at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.firstthreadproc() in C:\SimpleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 732 Unable to connect to DistributedObjects service on shark04: Thread exceptio nSystem.Exception: Error in Open: System.Net.Sockets.SocketException: A connecti on attempt failed because the connected party did not properly respond after a p eriod of time, or established connection failed because connected host has faile d to respond 192.168.0.2:55900 at System.Net.Sockets.Socket.Connect(IPAddress[] addresses, Int32 port) at System.Net.Sockets.Socket.Connect(String host, Int32 port) at MySpace.DataMining.DistributedObjects5.DistObject.Open() [Note: ensure th e Windows service is running] at MySpace.DataMining.DistributedObjects5.DistObject.Open() at MySpace.DataMining.DistributedObjects5.ArrayComboList.Open() at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.ensureopen() in C:\S impleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 702 at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.firstthreadproc() in C:\SimpleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 732
[6/25/2010 6:54:01 PM] Map done; starting map exchange
.................................................
Comment #5
Posted on Jun 26, 2010 by Grumpy CamelThanks for the workaround on the install directory! We will look into having a better default for this or other solution.
As far as the error you are getting from running the built-in word count job:
from within your private LAN, try at command line:
telnet 55900
if it fails, make sure there are no ports over 1000 blocked between the servers of your cluster only.
Also note, Qizmt should not be installed on servers which can be accessed from the internet. It is for private LAN only.
Comment #6
Posted on Jun 29, 2010 by Quick RabbitHi, Dan, I tried telnet 55900 at command line, it works OK. Also, we installed Qizmt in Windows 7 professional edition. Could you kindly look into the problem, and help us to solve it? Thanks in advance!
Comment #7
Posted on Jun 30, 2010 by Happy WombatYou mention formatting with hosts shark01 and shark02, but the error message is about host shark04. Be sure that you are not calling a previous install of Qizmt and that you formatted with the correct hosts. Also, ensure that Qizmt is installed in the same local directory on all hosts, e.g. on host1 c:\Qizmt\ and on host2 c:\Qizmt\
Comment #8
Posted on Jul 2, 2010 by Quick RabbitHi, Chris, sorry for making you confused. I do formated shark04, in the message there, I wanted to tell you that, the format is successful, but it still does not work. By the way, I do installed in the same directory in all the hosts, and the OS is windows 7 professional edition. Any idea about the solution? Thanks again for your taking time to look into the problem! Happy holiday!
Comment #9
Posted on Jul 14, 2010 by Quick RabbitToday, when I test, got the following error message Could you let me know how to solve the problem? Thanks in advance!
C:>qizmt exec For_Peter.xml Job Identifier: 378 [7/13/2010 6:37:30 PM] [MapReduce: Job Processing: get unique lower level category - upper level category ID pairs] Legend: m = map done; e = exchange done; s = sort done; r = reduce done 4 processes on 1 machines: (1000 * max((1769321 / 12) / (10737418240 / 12), 1)) = 1000 ffff [7/13/2010 6:37:34 PM] Distribution index done; starting map mmmm [7/13/2010 6:37:38 PM] Map done; starting map exchange eeee[exchange completed 00:00:01].ssss[sort completed 00:00:11]rrrr
[7/13/2010 6:37:43 PM] Done Output: dfs://job_01_output.txt Duration: 00:00:13
[7/13/2010 6:37:43 PM] [MapReduce: Job Processing: get upper level catego ry ID - keyword pair ] Legend: m = map done; e = exchange done; s = sort done; r = reduce done 5 processes on 1 machines: ................................................................................ .m.mm.m....m [7/13/2010 6:52:18 PM] Map done; starting map exchange ..........................ee..e.e....s.s...s.s.......r........r...r..r.......... ............eexchange completed 00:13:25.Thread exception: (exchange th read) System.Exception: SortBlocks error: Sub process 0 did not return a valid r esponse at MySpace.DataMining.DistributedObjects5.ArrayComboList.SortBlocks() at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.InZBlocks() in C:\Si mpleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 133 at MySpace.DataMining.AELight.AELight.MapReduceBlockInfo.exchangethreadproc() in C:\SimpleSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:l ine 958
Split count: 1 [7/13/2010 7:05:41 PM 802ms] \Shark01 DistributedObjectsSlave error: (build:368 6.29033) Problem loading remote zMapBlock '\localhost\C$\Users\MapReduce\MapRed uce\zmap_0_a60bbad1-6e8b-4e21-a896-9abaf0ffc8fa.j378.zm': System.Exception: Insu fficient resources for this job on cluster (ZBlock value file size > ZVALUEBLOCK _LIMIT) (consider increasing sub process count) at MySpace.DataMining.DistributedObjects5.ArrayComboListPart.ZBlock.Add(Byte[ ] keybuf, Int32 keyoffset, Byte[] valuebuf, Int32 valueoffset, Int32 valuelength ) in C:\SimpleSolutions\DataMining\DistributedObjects5\MySpace.DataMining.Distri butedObjects.DistributedObjectsSlave\ArrayComboListSlave.cs:line 340 at MySpace.DataMining.DistributedObjects5.ArrayComboListPart.ZMapStreamToZBlo cks(Stream stm, Int64 len, String sfn, Int32 iFILE_BUFFER_SIZE, Boolean bcompres szmaps) in C:\SimpleSolutions\DataMining\DistributedObjects5\MySpace.DataMining. DistributedObjects.DistributedObjectsSlave\ArrayComboListSlave.cs:line 5378 at MySpace.DataMining.DistributedObjects5.ArrayComboListPart.ProcessCommand(N etworkStream nstm, Char tag) in C:\SimpleSolutions\DataMining\DistributedObjects 5\MySpace.DataMining.DistributedObjects.DistributedObjectsSlave\ArrayComboListSl
ave.cs:line 5129
System.NullReferenceException: Object reference not set to an instance of an obj
ect.
at MySpace.DataMining.AELight.AELight._ExecOneMapReduce(String ExecOpts, Job
cfgj, String[] ExecArgs, Boolean verbose, Boolean verbosereplication, List1 Add
CacheNodes, List
1 AddCacheDfsFileNames, List1 AddCacheNodesOffsets, List
1 Add
CacheNodesRecLengths) in C:\SimpleSolutions\DataMining\DistributedObjects5\AELig
ht\ExecMapReduce.cs:line 2477
at MySpace.DataMining.AELight.AELight.ExecOneMapReduce(String ExecOpts, Job c
fgj, String[] ExecArgs, Boolean verbose, Boolean verbosereplication) in C:\Simpl
eSolutions\DataMining\DistributedObjects5\AELight\ExecMapReduce.cs:line 1473
at MySpace.DataMining.AELight.AELight.Exec(String ExecOpts, SourceCode cfg, S
tring[] ExecArgs, Boolean verbose, Boolean verbosereplication) in C:\SimpleSolut
ions\DataMining\DistributedObjects5\AELight\AELight.cs:line 421
at MySpace.DataMining.AELight.AELight.Exec(String ExecOpts, SourceCode cfg, S
tring[] ExecArgs, Boolean verbose) in C:\SimpleSolutions\DataMining\DistributedO
bjects5\AELight\AELight.cs:line 471
at MySpace.DataMining.AELight.AELight.AELightRun(String[] args) in C:\SimpleS
olutions\DataMining\DistributedObjects5\AELight\AELight.cs:line 3026
C:>
Comment #10
Posted on Jul 20, 2010 by Happy WombatThe relevant part of this error message is "System.Exception: Insufficient resources for this job on cluster (ZBlock value file size > ZVALUEBLOCK_LIMIT) (consider increasing sub process count)" and it means too much data is going to one intermediate data file.
This is addressed in the FAQ at http://code.google.com/p/qizmt/wiki/MySpaceQizmtFAQTroubleshoot#System.Exception:Insufficient_resources_for_this_job_on_cluster
One solution is to use the IntermediateDataAddressing tag in your job as explained at http://code.google.com/p/qizmt/wiki/MySpaceQizmtReferenceIntermediateDataAddressing this will simply allow more data to go into one intermediate data file.
Comment #11
Posted on Dec 21, 2010 by Grumpy Camel(No comment was entered for this change.)
Status: Duplicate
Labels:
Type-Defect
Priority-Medium